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Preface 


The response of students and teachers to the first three editions of Linear Algebra and 
Its Applications has been most gratifying. This Fourth Edition provides substantial 
support both for teaching and for using technology in the course. As before, the text 
provides a modern elementary introduction to linear algebra and a broad selection of 
interesting applications. The material is accessible to students with the maturity that 
should come from successful completion of two semesters of college-level mathematics, 
usually calculus. 

The main goal of the text is to help students master the basic concepts and skills they 
will use later in their careers. The topics here follow the recommendations of the Linear 
Algebra Curriculum Study Group, which were based on a careful investigation of the 
real needs of the students and a consensus among professionals in many disciplines that 
use linear algebra. Hopefully, this course will be one of the most useful and interesting 
mathematics classes taken by undergraduates. 


WHAT'S NEW IN THIS EDITION 

The main goal of this revision was to update the exercises and provide additional con¬ 
tent, both in the book and online. 

1. More than 25 percent of the exercises are new or updated, especially the computa¬ 
tional exercises. The exercise sets remain one of the most important features of this 
book, and these new exercises follow the same high standard of the exercise sets of 
the past three editions. They are crafted in a way that retells the substance of each 
of the sections they follow, developing the students’ confidence while challenging 
them to practice and generalize the new ideas they have just encountered. 

2. Twenty-five percent of chapter openers are new. These introductory vignettes pro¬ 
vide applications of linear algebra and the motivation for developing the mathematics 
that follows. The text returns to that application in a section toward the end of the 
chapter. 

3. A New Chapter: Chapter 8, The Geometry of Vector Spaces, provides a fresh topic 
that my students have really enjoyed studying. Sections 1, 2, and 3 provide the basic 
geometric tools. Then Section 6 uses these ideas to study Bezier curves and surfaces, 
which are used in engineering and online computer graphics (in Adobe® Illustrator® 
and Macromedia® FreeHand®). These four sections can be covered in four or five 
50-minute class periods. 

A second course in linear algebra applications typically begins with a substantial 
review of key ideas from the first course. If part of Chapter 8 is in the first course, 
the second course could include a brief review of sections 1 to 3 and then a focus on 
the geometry in sections 4 and 5. That would lead naturally into the online chapters 
9 and 10, which have been used with Chapter 8 at a number of schools for the past 
five years. 

4. The Study Guide ， which has always been an integral part of the book, has been up¬ 
dated to cover the new Chapter 8. As with past editions, the Study Guide incorporates 


ix 



x Preface 


detailed solutions to every third odd-numbered exercise as well as solutions to every 
odd-numbered writing exercise for which the text only provides a hint. 

5. Two new chapters are now available online, and can be used in a second course: 

Chapter 9. Optimization 

Chapter 10. Finite-State Markov Chains 

An access code is required and is available to qualified adopters. For more informa¬ 
tion, visit www.pearsonhighered.com/irc or contact your Pearson representative. 

6. PowerPoint® slides are now available for the 25 core sections of the text; also in¬ 
cluded are 75 figures from the text. 

DISTINCTIVE FEATURES 

Early Introduction of Key Concepts 

Many fundamental ideas of linear algebra are introduced within the first seven lectures, 
in the concrete setting of ， and then gradually examined from different points of view. 
Later generalizations of these concepts appear as natural extensions of familiar ideas, 
visualized through the geometric intuition developed in Chapter 1. A major achievement 
of this text is that the level of difficulty is fairly even throughout the course. 

A Modern View of Matrix Multiplication 

Good notation is crucial, and the text reflects the way scientists and engineers actually 
use linear algebra in practice. The definitions and proofs focus on the columns of a ma¬ 
trix rather than on the matrix entries. A central theme is to view a matrix-vector product 
Ax as a linear combination of the columns of A. This modem approach simplifies many 
arguments, and it ties vector space ideas into the study of linear systems. 

Linear Transformations 

Linear transformations form a “thread” that is woven into the fabric of the text. Their 
use enhances the geometric flavor of the text. In Chapter 1, for instance, linear trans¬ 
formations provide a dynamic and graphical view of matrix-vector multiplication. 

Eigenvalues and Dynamical Systems 

Eigenvalues appear fairly early in the text, in Chapters 5 and 7. Because this material 
is spread over several weeks, students have more time than usual to absorb and review 
these critical concepts. Eigenvalues are motivated by and applied to discrete and con¬ 
tinuous dynamical systems, which appear in Sections 1.10, 4.8, and 4.9, and in five 
sections of Chapter 5. Some courses reach Chapter 5 after about five weeks by covering 
Sections 2.8 and 2.9 instead of Chapter 4. These two optional sections present all the 
vector space concepts from Chapter 4 needed for Chapter 5. 

Orthogonality and Least-Squares Problems 

These topics receive a more comprehensive treatment than is commonly found in begin¬ 
ning texts. The Linear Algebra Curriculum Study Group has emphasized the need for 
a substantial unit on orthogonality and least-squares problems, because orthogonality 
plays such an important role in computer calculations and numerical linear algebra and 
because inconsistent linear systems arise so often in practical work. 
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PEDAGOGICAL FEATURES 


Applications 

A broad selection of applications illustrates the power of linear algebra to explain fun¬ 
damental principles and simplify calculations in engineering, computer science, math¬ 
ematics, physics, biology, economics, and statistics. Some applications appear in sep¬ 
arate sections; others are treated in examples and exercises. In addition, each chapter 
opens with an introductory vignette that sets the stage for some application of linear 
algebra and provides a motivation for developing the mathematics that follows. Later, 
the text returns to that application in a section near the end of the chapter. 

A Strong Geometric Emphasis 

Every major concept in the course is given a geometric interpretation, because many 
students learn better when they can visualize an idea. There are substantially more 
drawings here than usual, and some of the figures have never before appeared in a linear 
algebra text. 

Examples 

This text devotes a larger proportion of its expository material to examples than do 
most linear algebra texts. There are more examples than an instructor would ordinarily 
present in class. But because the examples are written carefully, with lots of detail, 
students can read them on their own. 

Theorems and Proofs 

Important results are stated as theorems. Other useful facts are displayed in tinted boxes, 
for easy reference. Most of the theorems have formal proofs, written with the beginning 
student in mind. In a few cases, the essential calculations of a proof are exhibited in a 
carefully chosen example. Some routine verifications are saved for exercises, when they 
will benefit students. 

Practice Problems 

A few carefully selected Practice Problems appear just before each exercise set. Com¬ 
plete solutions follow the exercise set. These problems either focus on potential trouble 
spots in the exercise set or provide a “warm-up” for the exercises, and the solutions 
often contain helpful hints or warnings about the homework. 

Exercises 


The abundant supply of exercises ranges from routine computations to conceptual ques¬ 
tions that require more thought. A good number of innovative questions pinpoint con¬ 
ceptual difficulties that I have found on student papers over the years. Each exercise 
set is carefully arranged in the same general order as the text; homework assignments 
are readily available when only part of a section is discussed. A notable feature of the 
exercises is their numerical simplicity. Problems “unfold” quickly, so students spend 
little time on numerical calculations. The exercises concentrate on teaching understand¬ 
ing rather than mechanical calculations. The exercises in the Fourth Edition maintain 
the integrity of the exercises from the third edition, while providing fresh problems for 
students and instructors. 

Exercises marked with the symbol [M] are designed to be worked with the aid of a 
“Matrix program”（a computer program, such as MATLAB®, Maple™, Mathematic a®, 
MathCad®, or Derive™, or a programmable calculator with matrix capabilities, such as 
those manufactured by Texas Instruments). 
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True/False Questions 

To encourage students to read all of the text and to think critically, I have developed 300 
simple true/false questions that appear in 33 sections of the text, just after the computa¬ 
tional problems. They can be answered directly from the text, and they prepare students 
for the conceptual problems that follow. Students appreciate these questions — after 
they get used to the importance of reading the text carefully. Based on class testing 
and discussions with students, I decided not to put the answers in the text. (The Study 
Guide tells the students where to find the answers to the odd-numbered questions.) An 
additional 150 true/false questions (mostly at the ends of chapters) test understanding 
of the material. The text does provide simple T/F answers to most of these questions, 
but it omits the justifications for the answers (which usually require some thought). 

Writing Exercises 

An ability to write coherent mathematical statements in English is essential for all stu¬ 
dents of linear algebra, not just those who may go to graduate school in mathematics. 
The text includes many exercises for which a written justification is part of the answer. 
Conceptual exercises that require a short proof usually contain hints that help a student 
get started. For all odd-numbered writing exercises, either a solution is included at 
the back of the text or a hint is provided and the solution is given in the Study Guide ， 
described below. 

Computational Topics 

The text stresses the impact of the computer on both the development and practice of 
linear algebra in science and engineering. Frequent Numerical Notes draw attention 
to issues in computing and distinguish between theoretical concepts, such as matrix 
inversion, and computer implementations, such as LU factorizations. 


WEB SUPPORT 


This Web site at www.pearsonhighered.com/lay contains support material for the text¬ 
book. For students, the Web site contains review sheets and practice exams (with 
solutions) that cover the main topics in the text. They come directly from courses I 
have taught in past years. Each review sheet identifies key definitions, theorems, and 
skills from a specified portion of the text. 

Applications by Chapters 

The Web site also contains seven Case Studies, which expand topics introduced at the 
beginning of each chapter, adding real-world data and opportunities for further explo¬ 
ration. In addition, more than 20 Application Projects either extend topics in the text or 
introduce new applications, such as cubic splines, airline flight routes, dominance matri¬ 
ces in sports competition, and error-correcting codes. Some mathematical applications 
are integration techniques, polynomial root location, conic sections, quadric surfaces, 
and extrema for functions of two variables. Numerical linear algebra topics, such as 
condition numbers, matrix factorizations, and the QR method for finding eigenvalues, 
are also included. Woven into each discussion are exercises that may involve large data 
sets (and thus require technology for their solution). 

Getting Started with Technology 

If your course includes some work with MATLAB, Maple, Mathematica, or TI cal¬ 
culators, you can read one of the projects on the Web site for an introduction to the 
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technology. In addition, the Study Guide provides introductory material for first-time 
users. 


Data Files 

Hundreds of files contain data for about 900 numerical exercises in the text, Case Stud¬ 
ies, and Application Projects. The data are available at www.pearsonhighered.com/lay 
in a variety of formats—for MATLAB, Maple, Mathematica, and the TI-83+/86/89 
graphic calculators. By allowing students to access matrices and vectors for a particular 
problem with only a few keystrokes, the data files eliminate data entry errors and save 
time on homework. 

MATLAB Projects 

These exploratory projects invite students to discover basic mathematical and numerical 
issues in linear algebra. Written by Rick Smith, they were developed to accompany a 
computational linear algebra course at the University of Florida, which has used Linear 
Algebra and Its Applications for many years. The projects are referenced by an icon 
I web I at appropriate points in the text. About half of the projects explore fundamental 
concepts such as the column space, diagonalization, and orthogonal projections; several 
projects focus on numerical issues such as flops, iterative methods, and the SVD; and a 
few projects explore applications such as Lagrange interpolation and Markov chains. 


SUPPLEMENTS 


Study Guide 

A printed version of the Study Guide is available at low cost. I wrote this Guide to 
be an integral part of the course. An icon | sg | in the text directs students to special 
subsections of the Guide that suggest how to master key concepts of the course. The 
Guide supplies a detailed solution to every third odd-numbered exercise, which allows 
students to check their work. A complete explanation is provided whenever an odd- 
numbered writing exercise has only a “Hint” in the answers. Frequent “Warnings” 
identify common errors and show how to prevent them. MATLAB boxes introduce 
commands as they are needed. Appendixes in the Study Guide provide comparable 
information about Maple, Mathematica, and TI graphing calculators (ISBN: 0-321- 
38883-6). 

Instructor’s Edition 

For the convenience of instructors, this special edition includes brief answers to all 
exercises. A Note to the Instructor at the beginning of the text provides a commentary 
on the design and organization of the text, to help instructors plan their courses. It also 
describes other support available for instructors. (ISBN: 0-321-38518-7) 

Instructor’s Technology Manuals 

Each manual provides detailed guidance for integrating a specific software package or 
graphic calculator throughout the course, written by faculty who have already used the 
technology with this text. The following manuals are available to qualified instructors 
through the Pearson Instructor Resource Center, www.pearsonhighered.com/irc: MAT¬ 
LAB (ISBN: 0-321-53365-8), Maple (ISBN: 0-321-75605-3), Mathematica (ISBN: 0- 
321-38885-2), and the TI-83+/86/89 (ISBN: 0-321-38887-9). 
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A Note to Students 


This course is potentially the most interesting and worthwhile undergraduate mathe¬ 
matics course you will complete. In fact, some students have written or spoken to me 
after graduation and said that they still use this text occasionally as a reference in their 
careers at major corporations and engineering graduate schools. The following remarks 
offer some practical advice and information to help you master the material and enjoy 
the course. 

In linear algebra, the concepts are as important as the computations. The simple 
numerical exercises that begin each exercise set only help you check your understanding 
of basic procedures. Later in your career, computers will do the calculations, but you 
will have to choose the calculations, know how to interpret the results, and then explain 
the results to other people. For this reason, many exercises in the text ask you to explain 
or justify your calculations. A written explanation is often required as part of the answer. 
For odd-numbered exercises, you will find either the desired explanation or at least a 
good hint. You must avoid the temptation to look at such answers before you have tried 
to write out the solution yourself. Otherwise, you are likely to think you understand 
something when in fact you do not. 

To master the concepts of linear algebra, you will have to read and reread the text 
carefully. New terms are in boldface type, sometimes enclosed in a definition box. A 
glossary of terms is included at the end of the text. Important facts are stated as theorems 
or are enclosed in tinted boxes, for easy reference. I encourage you to read the first five 
pages of the Preface to learn more about the structure of this text. This will give you a 
framework for understanding how the course may proceed. 

In a practical sense, linear algebra is a language. You must learn this language the 
same way you would a foreign language—with daily work. Material presented in one 
section is not easily understood unless you have thoroughly studied the text and worked 
the exercises for the preceding sections. Keeping up with the course will save you lots 
of time and distress! 

Numerical Notes 

I hope you read the Numerical Notes in the text, even if you are not using a computer or 
graphic calculator with the text. In real life, most applications of linear algebra involve 
numerical computations that are subject to some numerical error, even though that error 
may be extremely small. The Numerical Notes will warn you of potential difficulties in 
using linear algebra later in your career, and if you study the notes now, you are more 
likely to remember them later. 

If you enjoy reading the Numerical Notes, you may want to take a course later in 
numerical linear algebra. Because of the high demand for increased computing power, 
computer scientists and mathematicians work in numerical linear algebra to develop 
faster and more reliable algorithms for computations, and electrical engineers design 
faster and smaller computers to run the algorithms. This is an exciting field, and your 
first course in linear algebra will help you prepare for it. 
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Study Guide 

To help you succeed in this course, I suggest that you purchase the Study Guide 
(www.mypearsonstore.conv, 0-321-38883-6). Not only will it help you learn linear 
algebra, it also will show you how to study mathematics. At strategic points in your 
textbook, an icon | sg | will direct you to special subsections in the Study Guide entitled 
“Mastering Linear Algebra Concepts.” There you will find suggestions for constructing 
effective review sheets of key concepts. The act of preparing the sheets is one of 
the secrets to success in the course, because you will construct links between ideas. 
These links are the “glue” that enables you to build a solid foundation for learning and 
remembering the main concepts in the course. 

The Study Guide contains a detailed solution to every third odd-numbered exercise, 
plus solutions to all odd-numbered writing exercises for which only a hint is given in 
the Answers section of this book. The Guide is separate from the text because you 
must learn to write solutions by yourself, without much help. (I know from years of 
experience that easy access to solutions in the back of the text slows the mathematical 
development of most students.) The Guide also provides warnings of common errors 
and helpful hints that call attention to key exercises and potential exam questions. 

If you have access to technology—MATLAB, Maple, Mathematica, or a TI 
graphing calculator—you can save many hours of homework time. The Study Guide 
is your “lab manual” that explains how to use each of these matrix utilities. It 
introduces new commands when they are needed. You can download from the website 
www.pearsonhighered.com/lay the data for more than 850 exercises in the text. (With 
a few keystrokes, you can display any numerical homework problem on your screen.) 
Special matrix commands will perform the computations for you! 

What you do in your first few weeks of studying this course will set your pattern 
for the term and determine how well you finish the course. Please read “How to Study 
Linear Algebra” in the Study Guide as soon as possible. My students have found the 
strategies there very helpful, and I hope you will, too. 




Linear Equations in 
Linear Algebra 



INTRODUCTORY EXAMPLE 

Linear Models in Economics 
and Engineering 

It was late summer in 1949. Harvard Professor Wassily 
Leontief was carefully feeding the last of his punched 
cards into the university’s Mark II computer. The cards 
contained economic information about the U.S. economy 
and represented a summary of more than 250,000 pieces 
of information produced by the U.S. Bureau of Labor 
Statistics after two years of intensive work. Leontief had 
divided the U.S. economy into 500 “sectors，” such as the 
coal industry, the automotive industry, communications, 
and so on. For each sector, he had written a linear equation 
that described how the sector distributed its output to 
the other sectors of the economy. Because the Mark II, 
one of the largest computers of its day, could not handle 
the resulting system of 500 equations in 500 unknowns, 
Leontief had distilled the problem into a system of 42 
equations in 42 unknowns. 

Programming the Mark II computer for Leontief’s 42 
equations had required several months of effort, and he 
was anxious to see how long the computer would take 
to solve the problem. The Mark II hummed and blinked 
for 56 hours before finally producing a solution. We will 
discuss the nature of this solution in Sections 1.6 and 2.6. 

Leontief, who was awarded the 1973 Nobel Prize 
in Economic Science, opened the door to a new era 
in mathematical modeling in economics. His efforts 


at Harvard in 1949 marked one of the first significant 
uses of computers to analyze what was then a large- 
scale mathematical model. Since that time, researchers 
in many other fields have employed computers to analyze 
mathematical models. Because of the massive amounts of 
data involved, the models are usually linear, that is, they 
are described by systems of linear equations. 

The importance of linear algebra for applications has 
risen in direct proportion to the increase in computing 
power, with each new generation of hardware and 
software triggering a demand for even greater capabilities. 
Computer science is thus intricately linked with linear 
algebra through the explosive growth of parallel processing 
and large-scale computations. 

Scientists and engineers now work on problems far 
more complex than even dreamed possible a few decades 
ago. Today, linear algebra has more potential value for 
students in many scientific and business fields than any 
other undergraduate mathematics subject! The material in 
this text provides the foundation for further work in many 
interesting areas. Here are a few possibilities; others will 
be described later. 

• Oil exploration. When a ship searches for offshore 
oil deposits, its computers solve thousands of 
separate systems of linear equations every day. The 
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seismic data for the equations are obtained from 
underwater shock waves created by explosions 
from air guns. The waves bounce off subsurface 
rocks and are measured by geophones attached to 
mile-long cables behind the ship. 

• Linear programming. Many important management 
decisions today are made on the basis of linear 
programming models that utilize hundreds of 
variables. The airline industry, for instance, 


employs linear programs that schedule flight crews, 
monitor the locations of aircraft, or plan the varied 
schedules of support services such as maintenance 
and terminal operations. 

• Electrical networks. Engineers use simulation 
software to design electrical circuits and microchips 
involving millions of transistors. Such software 
relies on linear algebra techniques and systems of 
linear equations. 

WEB 


Systems of linear equations lie at the heart of linear algebra, and this chapter uses them to 
introduce some of the central concepts of linear algebra in a simple and concrete setting. 
Sections 1.1 and 1.2 present a systematic method for solving systems of linear equations. 
This algorithm will be used for computations throughout the text. Sections 1.3 and 
1.4 show how a system of linear equations is equivalent to a vector equation and to a 
matrix equation. This equivalence will reduce problems involving linear combinations 
of vectors to questions about systems of linear equations. The fundamental concepts of 
spanning, linear independence, and linear transformations, studied in the second half of 
the chapter, will play an essential role throughout the text as we explore the beauty and 
power of linear algebra. 


1.1 SYSTEMS OF LINEAR EQUATIONS 

A linear equation in the variables X\,... ,x n is an equation that can be written in the 
form 

a\X\ + a 2 x 2 H - h a n x n = b (1) 

where b and the coefficients a\,... ,a n are real or complex numbers, usually known 
in advance. The subscript n may be any positive integer. In textbook examples and 
exercises, n is normally between 2 and 5. In real-life problems, n might be 50 or 5000, 
or even larger. 

The equations 

Ax\ — 5x2 + 2 = X\ and = 2 (V6 — X\) + X 3 
are both linear because they can be rearranged algebraically as in equation (1): 

3x\ — 5x2 = —2 and 2x\ + X 2 — X 3 = 2\/6 

The equations 

4x\ — 5x2 = X 1 X 2 and X 2 = 2^/x[ — 6 

are not linear because of the presence of X\X 2 in the first equation and in the second. 

A system of linear equations (or a linear system) is a collection of one or more 
linear equations involving the same variables—say, X\,... ,x n . An example is 
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A solution of the system is a list (& ， & ， ••• ， 知 ) of numbers that makes each equation a 
true statement when the values ..., are substituted for x\,... ,x n , respectively. For 
instance, (5,6.5,3) is a solution of system (2) because, when these values are substituted 
in (2) for X\,X 2 , X 3 , respectively, the equations simplify to 8 = 8 and —7 = —7. 

The set of all possible solutions is called the solution set of the linear system. Two 
linear systems are called equivalent if they have the same solution set. That is, each 
solution of the first system is a solution of the second system, and each solution of the 
second system is a solution of the first. 

Finding the solution set of a system of two linear equations in two variables is easy 
because it amounts to finding the intersection of two lines. A typical problem is 


X \ _ 2^2 = _ 1 

— Xi + 3^2 = 3 


The graphs of these equations are lines, which we denote by £1 and I 2 - A pair of numbers 
(x\, X 2 ) satisfies both equations in the system if and only if the point (xi, X 2 ) lies on both 
and € 2 . In the system above, the solution is the single point (3,2), as you can easily 
verify. See Fig. 1. 


x i 



Of course, two lines need not intersect in a single point—they could be parallel, or 
they could coincide and hence “intersect” at every point on the line. Figure 2 shows the 
graphs that correspond to the following systems: 



Figures 1 and 2 illustrate the following general fact about linear systems, to be 
verified in Section 1.2. 
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A system of linear equations has 

1. no solution, or 

2. exactly one solution, or 

3. infinitely many solutions. 

A system of linear equations is said to be consistent if it has either one solution or 
infinitely many solutions; a system is inconsistent if it has no solution. 

Matrix Notation 

The essential information of a linear system can be recorded compactly in a rectangular 
array called a matrix. Given the system 

x\ — 2 X 2 + ^3 = 0 

2x2 — 8x3 = 8 (3) 

— 4xi + 5%2 + 9%3 = — 9 

with the coefficients of each variable aligned in columns, the matrix 

1 -2 1 " 

0 2-8 
-4 5 9_ 

is called the coefficient matrix (or matrix of coefficients) of the system (3), and 

1 -2 1 0 " 

0 2 -8 8 (4) 

-4 5 9 -9 

is called the augmented matrix of the system. (The second row here contains a zero be¬ 
cause the second equation could be written as 0 • Xi + 2x2 — 8x3 = 8.) An augmented 
matrix of a system consists of the coefficient matrix with an added column containing 
the constants from the right sides of the equations. 

The size of a matrix tells how many rows and columns it has. The augmented matrix 
(4) above has 3 rows and 4 columns and is called a 3 x 4 (read “3 by 4 ”） matrix. If m 
and n are positive integers, an m x n matrix is a rectangular array of numbers with m 
rows and n columns. (The number of rows always comes first.) Matrix notation will 
simplify the calculations in the examples that follow. 

Solving a Linear System 

This section and the next describe an algorithm, or a systematic procedure, for solving 
linear systems. The basic strategy is to replace one system with an equivalent system 
(i.e. y one with the same solution set) that is easier to solve. 

Roughly speaking, use the X\ term in the first equation of a system to eliminate 
the X\ terms in the other equations. Then use the X2 term in the second equation to 
eliminate the X2 terms in the other equations, and so on, until you finally obtain a very 
simple equivalent system of equations. 

Three basic operations are used to simplify a linear system: Replace one equation 
by the sum of itself and a multiple of another equation, interchange two equations, and 
multiply all the terms in an equation by a nonzero constant. After the first example, you 
will see why these three operations do not change the solution set of the system. 
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EXAMPLE 1 Solve system (3). 

SOLUTION The elimination procedure is shown here with and without matrix notation, 
and the results are placed side by side for comparison: 


X\ — 2X2 + X3 = 0 

2又2 — 8 x 3 = 8 

— 4xi + 5x2 9^3 = — 9 


1-210 
0 2-88 
-4 5 9 -9 


Keep X\ in the first equation and eliminate it from the other equations. To do so, add 4 
times equation 1 to equation 3. After some practice, this type of calculation is usually 
performed mentally: 


4 • [equation 1]: 4xi — 8 x 2 + 4 x 3 = 0 

+ [equation 3]: —4xi + 5 x 2 + 9 x 3 = —9 

[new equation 3]: — 3 x 2 + 13x3 = — 9 

The result of this calculation is written in place of the original third equation: 


xi - 2x 2 + x 3 = 0 「 1-2 1 0 


2 x 2 — 8 x 3 = 8 

_ 3^2 H - 13^3 = _ 9 


0 2-88 
0-3 13 -9 


Now, multiply equation 2 by 1/2 in order to obtain 1 as the coefficient for X 2 . (This 
calculation will simplify the arithmetic in the next step.) 


X\ — 2X2 + X3 = 0 

X 2 — 4^3 = 4 

_ 3x2 + 13^3 = _ 9 


1-210 
0 1-44 

0-3 13 -9 


Use the X 2 in equation 2 to eliminate the — 3 x 2 in equation 3. The “mental” computation 
is 


3 - [equation 2]: 3^2 - 12 x 3 = 12 

+ [equation 3] : — 3^2 + 13x3 = —9 

[new equation 3] : = 3 

The new system has a triangular form : 1 


X\ — 2X2 + X3 = 0 

X2 — 4X3 =4 
X 3 = 3 


1-210 
0 1-44 

0 0 13 


Eventually, you want to eliminate the — 2^2 term from equation 1, but it is more efficient 
to use the X 3 in equation 3 first, to eliminate the — 4^3 and +X 3 terms in equations 2 and 1. 


The two “mental” calculations are 

4 • [eq. 3]: 4^3 = 12 

+ [eq. 2]: X 2 — 4 x 3 = 4 
[new eq. 2 ]: X 2 =16 


—1 - [eq. 3]: — X 3 = —3 

+ [eq. 1 ]: x\ — 2 x 2 + A = 0 
[new eq. 1]: x\ - 2 x 2 = -3 


The intuitive term triangular will be replaced by a precise term in the next section. 
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It is convenient to combine the results of these two operations: 


X\ — 2^2 = _ 3 


-2 0 -3 


X2 =16 
X 3 = 3 


0 1 0 16 

0 0 1 3 


Now, having cleaned out the column above the X 3 in equation 3, move back to the in 
equation 2 and use it to eliminate the — 2^2 above it. Because of the previous work with 
X 3 , there is now no arithmetic involving X 3 terms. Add 2 times equation 2 to equation 
1 and obtain the system: 



Xi 

= 29 

"1 

0 

0 

29" 

X2 

=16 

0 

1 

0 

16 

x 3 = 3 

0 

0 

1 

3 


The work is essentially done. It shows that the only solution of the original system is 
(29, 16, 3). However, since there are so many calculations involved, it is a good practice 
to check the work. To verify that (29,16, 3) is a solution, substitute these values into 
the left side of the original system, and compute: 

(29)- 2(16)+ (3) = 29-32 + 3 = 0 
2(16) - 8(3) = 32-24 = 8 


Each of the original equations 
determines a plane in three- 
dimensional space. The point 
(29,16, 3) lies in all three planes. 


-4(29) + 5(16) + 9(3) = -116 + 80 + 27 = -9 

The results agree with the right side of the original system, so (29,16, 3) is a solution 
of the system. ■ 


Example 1 illustrates how operations on equations in a linear system correspond to 
operations on the appropriate rows of the augmented matrix. The three basic operations 
listed earlier correspond to the following operations on the augmented matrix. 


ELEMENTARY ROW OPERATIONS 

1. (Replacement) Replace one row by the sum of itself and a multiple of another 
row . 2 

2. (Interchange) Interchange two rows. 

3. (Scaling) Multiply all entries in a row by a nonzero constant. 


Row operations can be applied to any matrix, not merely to one that arises as the 
augmented matrix of a linear system. Two matrices are called row equivalent if there 
is a sequence of elementary row operations that transforms one matrix into the other. 

It is important to note that row operations are reversible. If two rows are inter¬ 
changed, they can be returned to their original positions by another interchange. If a 
row is scaled by a nonzero constant c, then multiplying the new row by 1 /c produces 
the original row. Finally, consider a replacement operation involving two rows—say, 
rows 1 and 2 —and suppose that c times row 1 is added to row 2 to produce a new row 2 . 
To “reverse” this operation, add —c times row 1 to (new) row 2 and obtain the original 
row 2. See Exercises 29-32 at the end of this section. 


2 A common paraphrase of row replacement is “Add to one row a multiple of another row.” 










1.1 Systems of Linear Equations 7 


At the moment, we are interested in row operations on the augmented matrix of a 
system of linear equations. Suppose a system is changed to a new one via row opera¬ 
tions. By considering each type of row operation, you can see that any solution of the 
original system remains a solution of the new system. Conversely, since the original 
system can be produced via row operations on the new system, each solution of the new 
system is also a solution of the original system. This discussion justifies the following 
statement. 

If the augmented matrices of two linear systems are row equivalent, then the two 
systems have the same solution set. 

Though Example 1 is lengthy, you will find that after some practice, the calculations 
go quickly. Row operations in the text and exercises will usually be extremely easy to 
perform, allowing you to focus on the underlying concepts. Still, you must learn to 
perform row operations accurately because they will be used throughout the text. 

The rest of this section shows how to use row operations to determine the size of a 
solution set, without completely solving the linear system. 


Existence and Uniqueness Questions 


Section 1.2 will show why a solution set for a linear system contains either no solutions, 
one solution, or infinitely many solutions. Answers to the following two questions will 
determine the nature of the solution set for a linear system. 

To determine which possibility is true for a particular system, we ask two questions. 


TWO FUNDAMENTAL QUESTIONS ABOUT A LINEAR SYSTEM 

1. Is the system consistent; that is, does at least one solution existl 

2. If a solution exists, is it the only one; that is, is the solution uniquel 


These two questions will appear throughout the text, in many different guises. This 
section and the next will show how to answer these questions via row operations on the 
augmented matrix. 

EXAMPLE 2 Determine if the following system is consistent: 


X\ — 2X2 + X3 = 0 

2x2 — 8x3 = 8 

— 4xi + 5x2 + 9%3 = — 9 


SOLUTION This is the system from Example 1. Suppose that we have performed the 
row operations necessary to obtain the triangular form 


X\ — 2X2 + ^ 3=0 

X2 — 4^3 = 4 
X 3 = 3 


1-210 
0 1-44 

0 0 13 


At this point, we know X 3 . Were we to substitute the value of X 3 into equation 2, we 
could compute X 2 and hence could determine x\ from equation 1. So a solution exists; 
the system is consistent. (In fact, X 2 is uniquely determined by equation 2 since X 3 has 
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only one possible value, and X\ is therefore uniquely determined by equation 1. So the 
solution is unique.) ■ 

EXAMPLE 3 Determine if the following system is consistent: 


X2 — 4 又 3 = 8 

2xi — 3x2 + 2x3 = 1 (5) 

5x\ — 8 x 2 + 7^3 = 1 

SOLUTION The augmented matrix is 

'0 1-4 8" 

2-321 

5-871 


To obtain an x\ in the first equation, interchange rows 1 and 2: 

"2 -3 2 1" 

0 1-48 

5-871 


To eliminate the 5x\ term in the third equation, add —5/2 times row 1 to row 3: 

"2 -3 2 1 

0 1-48 ( 6 ) 

_0 -1/2 2 -3/2 

Next, use the X 2 term in the second equation to eliminate the — ( 1 / 2)%2 term from the 
third equation. Add 1/2 times row 2 to row 3: 

"2 -3 2 1 

0 1-4 8 (7) 

0 0 0 5/2 



This system is inconsistent 
because there is no point that lies 
in all three planes. 


The augmented matrix is now in triangular form. To interpret it correctly, go back to 
equation notation: 

2 x\ — 3x2 + 2%3 = 1 

X2 — 4^3 = 8 (8) 

0 = 5/2 

The equation 0 = 5/2 is a short form of Oxi + 0x2 + 0^3 = 5/2. This system in trian¬ 
gular form obviously has a built-in contradiction. There are no values of X\,X 2 , X 3 that 
satisfy ( 8 ) because the equation 0 = 5/2 is never true. Since ( 8 ) and (5) have the same 
solution set, the original system is inconsistent (i.e., has no solution). ■ 


Pay close attention to the augmented matrix in (7). Its last row is typical of an 
inconsistent system in triangular form. 
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i— NUMERICAL NOTE - 

In real-world problems, systems of linear equations are solved by a computer. For 
a square coefficient matrix, computer programs nearly always use the elimination 
algorithm given here and in Section 1.2, modified slightly for improved accuracy. 

The vast majority of linear algebra problems in business and industry are 
solved with programs that use floating point arithmetic. Numbers are represented 
as decimals 士 .d\ … d p x 10 ; , where r is an integer and the number p of digits to 
the right of the decimal point is usually between 8 and 16. Arithmetic with such 
numbers typically is inexact, because the result must be rounded (or truncated) to 
the number of digits stored. “Roundoff error” is also introduced when a number 
such as 1/3 is entered into the computer, since its decimal representation must be 
approximated by a finite number of digits. Fortunately, inaccuracies in floating 
point arithmetic seldom cause problems. The numerical notes in this book will 
occasionally warn of issues that you may need to consider later in your career. 


PRACTICE PROBLEMS 


Throughout the text, practice problems should be attempted before working the exer¬ 
cises. Solutions appear after each exercise set. 


1. State in words the next elementary row operation that should be performed on the 
system in order to solve it. [More than one answer is possible in (a).] 


a. x\ + 4^2 - 2^3 + 8^4 = 

12 

b. x\ — 3x2 + 5x3 — 2 又 4 = 

0 

X2 — 1X2, + 2^4 = 

-4 

X2 + 8X3 = 

-4 

5^3 — X4 = 

7 

2x3 = 

3 

X3 + 3 又 4 = 

-5 




2. The augmented matrix of a linear system has been transformed by row operations 
into the form below. Determine if the system is consistent. 


15 2-6 

0 4-72 

0 0 5 0 


3 . Is (3,4, —2) a solution of the following system? 


5 x\ — X2 ~h 2 X 3 = 7 

— 2 xi + 6x2 + 9x3 = 0 

— 1 X\ -|- 5x2 _ 3x3 = _ 7 

4 . For what values of h and k is the following system consistent? 


2x\ — X 2 = h 
—6x\ + 3x2 = k 
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1.1 EXERCISES 


Solve each system in Exercises 1-4 by using elementary row 
operations on the equations or on the augmented matrix. Follow 
the systematic elimination procedure described in this section. 

1 * 5x2 = 7 2 . + 6 x 2 = _ 3 

— 2,xi — 1x2 ― — 5 + 7^2 — 10 

3. Find the point (xi, X 2 ) that lies on the line Xi + 2x2 = 4 and 
on the line X\ — X 2 = l. See the figure. 



4. Find the point of intersection of the lines X\ + 2 x 2 = —13 
and 3^i — 2 x 2 = 1 


3 0 -2 -7 " 

10 3 6 

0 10 2 

0 0 1 -2_ 

Solve the systems in Exercises 11-14. 

11. X 2 + 5x3 = _ 4 
X\ -f - 4x2 + 3x3 = — 2 

2xi -f- 1x2 "f - X 3 — — 2 

12 . X\ — 5 尤 2 + 4^3 = _ 3 
2xi — 7 又 2 + 3^3 =： — 2 

— 2xi ~h X 2 ~h 7^3 — — 1 

13 . X\ — 3^3 — 8 

2xi + 2 又 2 + 9 又 3 = 7 

X 2 5^3 = — 2 

14. — 6 x 3 = _ 8 

X 2 + 2^3 — 3 

2>X\ - 6^2 — 2^3 = — 4 


Consider each matrix in Exercises 5 and 6 as the augmented matrix 
of a linear system. State in words the next two elementary row 
operations that should be performed in the process of solving the 
system. 

" 1 -4 -3 0 7" 

0 14 0 6 

' 0 0 10 2 

_0 0 0 1 -5_ 

"1 -6 4 0 -1" 

0 2 -7 0 4 

6， 0 0 1 2 -3 

0 0 4 1 2 


Determine if the systems in Exercises 15 and 16 are consistent. 
Do not completely solve the systems. 

15 . X\ — 6 x 2 =5 

X2 _ 4X3 X4 — 0 

—X\ + 6 x 2 + X3 + 5^4 = 3 
— X 2 + 5X3 + 4^4 — 0 

16 . 2 ,xi — 4^4 = 一 10 

3x2 + 3x3 — 0 

又 3 + 4X4 = _ 1 

— + 2x2 + 3^3 ~h = 5 


In Exercises 7-10, the augmented matrix of a linear system has 
been reduced by row operations to the form shown. In each case, 
continue the appropriate row operations and describe the solution 
set of the original system. 

'1 7 3 -4" 

7 0 1 -1 3 

* 0 0 0 1 

0 0 1-2 


17. Do the three lines 2xi + 3 x 2 = — 1 ， 6 x 1 + 5 x 2 = 0, and 
2x\ — 5x2 = 7 have a common point of intersection? Ex¬ 
plain. 

18. Do the three planes 2xi + 4x2 + 4^3 = 4, X 2 — 2xs = —2, 
and 2x\ + 3 x 2 = 0 have at least one common point of inter¬ 
section? Explain. 

In Exercises 19-22, determine the value(s) of h such that the 
matrix is the augmented matrix of a consistent linear system. 


1-5400 
0 10 10 
0 0 3 0 0 
0 0 0 2 0 


19. 

21 . 


"1 
_3 

h 

4' 

20 . 

6 

8 _ 

"1 

4 

- 2 ' 

22 . 

_3 

h 

—6 


h —5 
-8 6 ■ 

12 h 
-6 -3 


1-1 0 0-5 

0 1-2 0-7 

001-32 
0 0 0 1 4 


In Exercises 23 and 24, key statements from this section are 
either quoted directly, restated slightly (but still true), or altered 
in some way that makes them false in some cases. Mark each 
statement True or False, and justify your answer. (If true, give the 
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approximate location where a similar statement appears, or refer 
to a definition or theorem. If false, give the location of a statement 
that has been quoted or used incorrectly, or cite an example that 
shows the statement is not true in all cases.) Similar true/false 
questions will appear in many sections of the text. 

23. a. Every elementary row operation is reversible. 

b. A 5 x 6 matrix has six rows. 

c. The solution set of a linear system involving variables 
Xi,..., x n is a list of numbers (^i,..., s„) that makes each 
equation in the system a true statement when the values 
S\,... ,s n are substituted for Xi,... ,x n , respectively. 

d. Two fundamental questions about a linear system involve 
existence and uniqueness. 

24. a. Two matrices are row equivalent if they have the same 

number of rows. 


In Exercises 29-32, find the elementary row operation that trans¬ 
forms the first matrix into the second, and then find the reverse 
row operation that transforms the second matrix into the first. 



'0 -2 5" 


"3 -1 6" 

29. 

1 3 -5 

, 

1 3 -5 


3—1 6_ 


0-2 5 _ 


_1 3 -4" 


"1 3 -4" 

30. 

0-2 6 

， 

0-2 6 


0-5 10 


0 1 -2 



"1 

-2 

1 

0" 


"1 

-2 

1 

0" 

31. 

0 

5 

-2 

8 


0 

5 

-2 

8 


4 

-1 

3 

-6 


0 

7 -1 

—6 



" 1 

2 

-5 

0" 


"1 

2 

-5 

0" 

32. 

0 

1 

-3 

-2 

, 

0 

1 

-3 

-2 


0 

4 

-12 

7 


0 

0 

0 

15 


b. Elementary row operations on an augmented matrix never 
change the solution set of the associated linear system. 

c. Two equivalent linear systems can have different solution 
sets. 

d. A consistent system of linear equations has one or more 
solutions. 

25 . Find an equation involving g, h, and k that makes 
this augmented matrix correspond to a consistent system: 

"1 -4 7 g _ 

0 3 -5 h 

_-2 5-9 k _ 

26 . Suppose the system below is consistent for all possible values 
of f and g. What can you say about the coefficients c and 
dl Justify your answer. 

2 x\ + 4x 2 = / 

CX\ + dx2 = g 


An important concern in the study of heat transfer is to determine 
the steady-state temperature distribution of a thin plate when the 
temperature around the boundary is known. Assume the plate 
shown in the figure represents a cross section of a metal beam, 
with negligible heat flow in the direction perpendicular to the 
plate. Let Tu ... denote the temperatures at the four interior 
nodes of the mesh in the figure. The temperature at a node is 
approximately equal to the average of the four nearest nodes—to 
the left, above, to the right, and below. 3 For instance, 

= (10 + 20+ r 2 + T 4 )/4, or 4T { -T 2 -T A = 30 


20 ° 20 ° 



1 

2 


4 

3 





30° 30° 


27. Suppose a, b, c, and d are constants such that a is not zero 
and the system below is consistent for all possible values of 
f and g. What can you say about the numbers a, b, c, and 
dl Justify your answer. 

ax\ + bx 2 = f 
cx\ + dx 2 = g 

28. Construct three different augmented matrices for linear sys¬ 
tems whose solution set is x\ = 3, X 2 = —2, X 3 = —1. 


33. Write a system of four equations whose solution gives esti¬ 
mates for the temperatures T\,... ,T 4 . 

34. Solve the system of equations from Exercise 33. [Hint: To 
speed up the calculations, interchange rows 1 and 4 before 
starting “replace” operations.] 


3 See Frank M. White, Heat and Mass Transfer (Reading, MA: 
Addison-Wesley Publishing, 1991), pp. 145-149. 


SOLUTIONS TO PRACTICE PROBLEMS 


1. a. For “hand computation,” the best choice is to interchange equations 3 and 4. 
Another possibility is to multiply equation 3 by 1/5. Or, replace equation 4 by 
its sum with —1/5 times row 3. (In any case, do not use the X 2 in equation 2 to 
eliminate the 4x2 in equation 1. Wait until a triangular form has been reached and 
the X 3 terms and terms have been eliminated from the first two equations.) 
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Since (3,4, —2) satisfies the first 
two equations, it is on the line of 
the intersection of the first two 
planes. Since (3,4, —2) does not 
satisfy all three equations, it does 
not lie on all three planes. 


b. The system is in triangular form. Further simplification begins with the X 4 in the 
fourth equation. Use the X 4 to eliminate all X 4 terms above it. The appropriate 
step now is to add 2 times equation 4 to equation 1. (After that, move to equation 
3, multiply it by 1/2, and then use the equation to eliminate the X 3 terms above 
it.) 

2. The system corresponding to the augmented matrix is 

X\ + 5x2 + 2a ：3 = —6 

4x2 — 7^3 = 2 

5x3 = 0 

The third equation makes X 3 = 0, which is certainly an allowable value for X 3 . After 
eliminating the X 3 terms in equations 1 and 2 , you could go on to solve for unique 
values for X 2 and X\. Hence a solution exists, and it is unique. Contrast this situation 
with that in Example 3. 

3. It is easy to check if a specific list of numbers is a solution. Set x\ = 3, X2 = 4, and 
X 3 = — 2 , and find that 

5 ( 3 ) - (4) + 2(-2) = 15-4-4 = 7 

-2(3) + 6(4) + 9(-2) = -6 + 24 - 18 = 0 
-7(3) + 5(4) - 3(-2) = -21 + 20 + 6 = 5 

Although the first two equations are satisfied, the third is not, so (3, 4, —2) is not a 
solution of the system. Notice the use of parentheses when making the substitutions. 
They are strongly recommended as a guard against arithmetic errors. 

4. When the second equation is replaced by its sum with 3 times the first equation, the 
system becomes 

2 x\ — X 2 = h 

0 = k-\-3h 

If k 3h is nonzero, the system has no solution. The system is consistent for any 
values of h and k that make k 3h = 0. 


1.2 ROW REDUCTION AND ECHELON FORMS 


This section refines the method of Section 1.1 into a row reduction algorithm that will 
enable us to analyze any system of linear equations. 1 By using only the first part of 
the algorithm, we will be able to answer the fundamental existence and uniqueness 
questions posed in Section 1.1. 

The algorithm applies to any matrix, whether or not the matrix is viewed as an 
augmented matrix for a linear system. So the first part of this section concerns an 
arbitrary rectangular matrix and begins by introducing two important classes of matrices 
that include the “triangular” matrices of Section 1.1. In the definitions that follow, a 
nonzero row or column in a matrix means a row or column that contains at least one 
nonzero entry; a leading entry of a row refers to the leftmost nonzero entry (in a nonzero 
row). 


1 The algorithm here is a variant of what is commonly called Gaussian elimination. A similar elimination 
method for linear systems was used by Chinese mathematicians in about 250 B.C. The process was unknown 
in Western culture until the nineteenth century, when a famous German mathematician, Carl Friedrich Gauss, 
discovered it. A German engineer, Wilhelm Jordan, popularized the algorithm in an 1888 text on geodesy. 
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DEFINITION 


A rectangular matrix is in echelon form (or row echelon form) if it has the 
following three properties: 

1. All nonzero rows are above any rows of all zeros. 

2. Each leading entry of a row is in a column to the right of the leading entry of 
the row above it. 

3. All entries in a column below a leading entry are zeros. 

If a matrix in echelon form satisfies the following additional conditions, then it is 

in reduced echelon form (or reduced row echelon form): 

4. The leading entry in each nonzero row is 1. 

5. Each leading 1 is the only nonzero entry in its column. 


An echelon matrix (respectively, reduced echelon matrix) is one that is in echelon 
form (respectively, reduced echelon form). Property 2 says that the leading entries form 
an echelon (“steplike”）pattern that moves down and to the right through the matrix. 
Property 3 is a simple consequence of property 2, but we include it for emphasis. 

The “triangular” matrices of Section 1.1, such as 

'2-3 2 1 

0 1-4 8 

_0 0 0 5/2 

are in echelon form. In fact, the second matrix is in reduced echelon form. Here are 
additional examples. 

EXAMPLE 1 The following matrices are in echelon form. The leading entries (■) 
may have any nonzero value; the starred entries (*) may have any value (including zero). 

0 ■ 木 * 氺氺 氺 ** 氺 

000 ■氺 氺氺 本木氺 
0000 ****** 
00000 "**** 
00000000 * *_ 

The following matrices are in reduced echelon form because the leading entries are Ts, 
and there are 0’s below and above each leading 1. 

1 0 * * 

0 1 * * 

0 0 0 0 

0 0 0 0 

■ 


0 1 
0 0 
0 0 
0 0 
0 0 


* 0 

0 1 

0 0 

0 0 

0 0 


0 0 
0 0 
1 0 
0 1 
0 0 


氺 氺 
氺 氺 
氺 氺 
* * 
0 0 


0 氺 
0 氺 
0 氺 
0 * 
1 * 


■ 本 本本 
0 ■ * * 
0 0 0 0 
0 0 0 0 



1 

0 

0 

29 

and 

0 

1 

0 

16 


0 

0 

1 

3 


Any nonzero matrix may be row reduced (that is, transformed by elementary row 
operations) into more than one matrix in echelon form, using different sequences of row 
operations. However, the reduced echelon form one obtains from a matrix is unique. 
The following theorem is proved in Appendix A at the end of the text. 


Uniqueness of the Reduced Echelon Form 

Each matrix is row equivalent to one and only one reduced echelon matrix. 


THEOREM 1 
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DEFINITION 


If a matrix A is row equivalent to an echelon matrix U ， we call U an echelon form 
(or row echelon form) of A; if is in reduced echelon form, we call U the reduced 
echelon form of A. [Most matrix programs and calculators with matrix capabilities 
use the abbreviation RREF for reduced (row) echelon form. Some use REF for (row) 
echelon form.] 

Pivot Positions 

When row operations on a matrix produce an echelon form, further row operations to 
obtain the reduced echelon form do not change the positions of the leading entries. Since 
the reduced echelon form is unique, the leading entries are always in the same positions 
in any echelon form obtained from a given matrix. These leading entries correspond to 
leading l’s in the reduced echelon form. 


A pivot position in a matrix vl is a location in A that corresponds to a leading 1 
in the reduced echelon form of A. A pivot column is a column of A that contains 
a pivot position. 


In Example 1, the squares (■) identify the pivot positions. Many fundamental 
concepts in the first four chapters will be connected in one way or another with pivot 
positions in a matrix. 


EXAMPLE 2 

columns of A. 


Row reduce the matrix A below to echelon form, and locate the pivot 

" 0 -3 -6 4 9" 

-1-2 -1 3 1 

A= -2 -3 0 3 -1 

1 4 5-9-7 


SOLUTION Use the same basic strategy as in Section 1.1. The top of the leftmost 
nonzero column is the first pivot position. A nonzero entry, or pivot, must be placed 
in this position. A good choice is to interchange rows 1 and 4 (because the mental 
computations in the next step will not involve fractions). 


I —— Pivot 

14 5-9-7 

-1 -2-1 3 1 

-2 -3 0 3 -1 

0-3-6 4 9 



Pivot column 


Create zeros below the pivot, 1, by adding multiples of the first row to the rows below, 


and obtain matrix (1) below. The pivot position in the second row must be as far left 
as possible—namely, in the second column. Choose the 2 in this position as the next 
pivot. 

I —— Pivot 


1 

4 

5 

-9 

-7 

0 

2 — 

4 

-6 

-6 

0 

5 

10 

-15 

-15 

0 

-3 

-6 

4 

9 


Next pivot column 








1.2 Row Reduction and Echelon Forms 15 


Add —5/2 times row 2 to row 3, and add 3/2 times row 2 to row 4. 

'1 4 5 -9 -7" 

0 2 4 -6 -6 

0 0 0 0 0 

000-50 


⑵ 


The matrix in (2) is different from any encountered in Section 1.1. There is no way to 
create a leading entry in column 3! (We can’t use row 1 or 2 because doing so would 
destroy the echelon arrangement of the leading entries already produced.) However, if 
we interchange rows 3 and 4, we can produce a leading entry in column 4. 


(—Pivot 


■1 

4 

5 

-9 

-7" 


■ 

氺 

氺 

* 

氺 

0 

2 

4 

-6 

-6 

General form: 

0 

■ 

氺 

氺 

氺 

0 

0 

0 

—5 

0 

0 

0 

0 

■ 

氺 

0 

0 

0 

0 

0 


0 

0 

0 

0 

0_ 


Pivot columns 


The matrix is in echelon form and thus reveals that columns 1, 2, and 4 of A are pivot 
columns. 

Pivot positions 



「0 

」-3 

-6 

4 

9 

A = 

-1 

-2 

-1 

3 

1 

-2 

-3 

0 

3 

-1 


1 

4 

5 

-9 

-7 


Pivot columns 


(3) 


■ 


A pivot, as illustrated in Example 2, is a nonzero number in a pivot position that is 
used as needed to create zeros via row operations. The pivots in Example 2 were 1, 2, 
and —5. Notice that these numbers are not the same as the actual elements of A in the 
highlighted pivot positions shown in (3). 

With Example 2 as a guide, we are ready to describe an efficient procedure for 
transforming a matrix into an echelon or reduced echelon matrix. Careful study and 
mastery of this procedure now will pay rich dividends later in the course. 


The Row Reduction Algorithm 

The algorithm that follows consists of four steps, and it produces a matrix in echelon 
form. A fifth step produces a matrix in reduced echelon form. We illustrate the algorithm 
by an example. 

EXAMPLE 3 Apply elementary row operations to transform the following matrix 
first into echelon form and then into reduced echelon form: 


0 

3 

-6 

6 

4 

-5 

3 

-7 

8 

-5 

8 

9 

3 

-9 

12 

-9 

6 

15 


SOLUTION 


STEP 1 

Begin with the leftmost nonzero column. This is a pivot column. The pivot 
position is at the top. 
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0 3 -6 6 4 -5 

3 -7 8 -5 8 9 

3-9 12 -9 6 15 

斗 

1 — Pivot column 


STEP 2 

Select a nonzero entry in the pivot column as a pivot. If necessary, interchange 
rows to move this entry into the pivot position. 


Interchange rows 1 and 3. (We could have interchanged rows 1 and 2 instead.) 


STEP 3 

Use row replacement operations to create zeros in all positions below the pivot. 


As a preliminary step, we could divide the top row by the pivot, 3. But with two 3’s in 
column 1, it is just as easy to add —1 times row 1 to row 2. 

I —— Pivot 


3 」 

-9 

12 

-9 

6 

15 

0 

2 

-4 

4 

2 

—6 

0 

3 

-6 

6 

4 

-5 


STEP 4 

Cover (or ignore) the row containing the pivot position and cover all rows, if any, 
above it. Apply steps 1-3 to the submatrix that remains. Repeat the process until 
there are no more nonzero rows to modify. 


With row 1 covered, step 1 shows that column 2 is the next pivot column; for step 2, 
select as a pivot the “top” entry in that column. 

— Pivot 

3-9 12 -9 6 15 

0 2 -4 4 2 -6 

0 3 -6 6 4 -5 


I —— New pivot column 

For step 3, we could insert an optional step of dividing the “top” row of the submatrix by 
the pivot, 2. Instead, we add —3/2 times the “top” row to the row below. This produces 


3 -9 12 -9 6 15 

0 2 -4 4 2 -6 

0 0 0 0 1 4 


5 9 5 


6 8 4 

9 5 6 


2 8 6 
ot l I 

V 



3 3 0 
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Steps 1-3 require no work for this submatrix, and we have reached an echelon form of 
the full matrix. If we want the reduced echelon form, we perform one more step. 


STEP 5 

Beginning with the rightmost pivot and working upward and to the left, create 
zeros above each pivot. If a pivot is not 1, make it 1 by a scaling operation. 


The rightmost pivot is in row 3. Create zeros above it, adding suitable multiples of row 
3 to rows 2 and 1. 


3 -9 12 -9 0 -9 

0 2 -4 4 0 -14 

0 0 0 0 1 4 


Row 1 + (—6). row 3 
Row 2 + (—2) - row 3 


The next pivot is in row 2. Scale this row, dividing by the pivot. 


-9 12 
1 -2 
0 0 


0-9 
0-7 
1 4 


Row scaled by ^ 


Create a zero in column 2 by adding 9 times row 2 to row 1. 


3 0 -6 9 0 -72 

0 1 -2 2 0 -7 

0 0 0 0 1 4 


Row 1 + (9) • row 2 


Finally, scale row 1, dividing by the pivot, 3. 


1 0 -2 3 0 -24 

0 1 -2 2 0 -7 

0 0 0 0 1 4 


Row scaled by | 


This is the reduced echelon form of the original matrix. 


■ 


The combination of steps 1-4 is called the forward phase of the row reduction 
algorithm. Step 5, which produces the unique reduced echelon form, is called the 

backward phase. 


NUMERICAL NOTE 


In step 2 above, a computer program usually selects as a pivot the entry in a 
column having the largest absolute value. This strategy, called partial pivoting, 
is used because it reduces roundoff errors in the calculations. 


When we cover the row containing the second pivot position for step 4, we are left with 
a new submatrix having only one row: 


Vo 

pi 


6 2 1 

9 4 0 
I 

2 4 0 

II 

9 2 0 
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Solutions of Linear Systems 

The row reduction algorithm leads directly to an explicit description of the solution set 
of a linear system when the algorithm is applied to the augmented matrix of the system. 

Suppose, for example, that the augmented matrix of a linear system has been 
changed into the equivalent reduced echelon form 

'1 0-5 1" 

0 114 

0 0 0 0 _ 

There are three variables because the augmented matrix has four columns. The 
associated system of equations is 


X\ — 5x3 = 1 

^2 + X 3 = 4 (4) 

0 =0 

The variables X\ and corresponding to pivot columns in the matrix are called basic 
variables. 2 The other variable, X 3 , is called a free variable. 

Whenever a system is consistent, as in (4), the solution set can be described 
explicitly by solving the reduced system of equations for the basic variables in terms of 
the free variables. This operation is possible because the reduced echelon form places 
each basic variable in one and only one equation. In (4), solve the first equation for x\ 
and the second for X 2 . (Ignore the third equation; it offers no restriction on the variables.) 

xi = 1 + 5x3 

< x 2 = 4-x 3 (5) 

X3 is free 

The statement “X3 is free” means that you are free to choose any value for X 3 . Once 
that is done, the formulas in (5) determine the values for X\ and For instance, when 
X 3 = 0, the solution is (1,4,0); when X3 = 1, the solution is (6,3,1). Each different 
choice of X 3 determines a (different) solution of the system, and every solution of the 
system is determined by a choice of x^. 


EXAMPLE 4 Find the general solution of the linear system whose augmented ma¬ 
trix has been reduced to 

"1 6 2-5-2 -4" 

0 0 2 -8 -1 3 

_0 0 0 0 1 7_ 

SOLUTION The matrix is in echelon form, but we want the reduced echelon form 
before solving for the basic variables. The row reduction is completed next. The symbol 
〜 before a matrix indicates that the matrix is row equivalent to the preceding matrix. 


'1 

6 

2 

-5 

-2 

-4" 


"1 

6 

2 

-5 

0 

10" 

0 

0 

2 

-8 

-1 

3 

〜 

0 

0 

2 

-8 

0 

10 

_0 

0 

0 

0 

1 

7_ 


_0 

0 

0 

0 

1 

7_ 

一 1 

6 

2 

-5 

0 

10" 


"1 

6 

0 

3 

0 

0" 

0 

0 

1 

-4 

0 

5 

〜 

0 

0 

1 

-4 

0 

5 

0 

0 

0 

0 

1 

7 


0 

0 

0 

0 

1 

7 


2 Some texts use the term leading variables because they correspond to the columns containing leading 
entries. 
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There are five variables because the augmented matrix has six columns. The associated 
system now is 

x\ + 6x2 + 3 x 4 = 0 

X3 — 4^4 = 5 (6) 

•^5 = 7 

The pivot columns of the matrix are 1 ， 3, and 5, so the basic variables are X\, X 3 , and 
X 5 . The remaining variables, X 2 and X 4 , must be free. Solve for the basic variables to 
obtain the general solution: 

X\ = — 6 x 2 — 3^4 

X 2 is free 

< X3 = 5 4x4 ⑺ 

X 4 is free 
、%5 = 7 

Note that the value of X 5 is already fixed by the third equation in system (6). ■ 

Parametric Descriptions of Solution Sets 

The descriptions in (5) and (7) are parametric descriptions of solution sets in which 
the free variables act as parameters. Solving a system amounts to finding a parametric 
description of the solution set or determining that the solution set is empty. 

Whenever a system is consistent and has free variables, the solution set has many 
parametric descriptions. For instance, in system (4), we may add 5 times equation 2 to 
equation 1 and obtain the equivalent system 

x\ + 5 x 2 =21 

X2 X 3 = 4 

We could treat X 2 as a parameter and solve for x\ and X 3 in terms of X 2 , and we would 
have an accurate description of the solution set. However, to be consistent, we make the 
(arbitrary) convention of always using the free variables as the parameters for describing 
a solution set. (The answer section at the end of the text also reflects this convention.) 

Whenever a system is inconsistent, the solution set is empty, even when the system 
has free variables. In this case, the solution set has no parametric representation. 

Back-Substitution 

Consider the following system, whose augmented matrix is in echelon form but is not 
in reduced echelon form: 

X\ — 1X2 + 2^3 - 5x4 + 8 x 5 = 10 
X 2 _ 3 又 3 + 3 又 4 + X 5 = _ 5 

X 4 — X 5 = 4 

A computer program would solve this system by back-substitution, rather than by com¬ 
puting the reduced echelon form. That is, the program would solve equation 3 for X 4 in 
terms of X 5 and substitute the expression for X 4 into equation 2 , solve equation 2 for X 2 , 
and then substitute the expressions for X 2 and X 4 into equation 1 and solve for x\. 

Our matrix format for the backward phase of row reduction, which produces the re¬ 
duced echelon form, has the same number of arithmetic operations as back-substitution. 
But the discipline of the matrix format substantially reduces the likelihood of errors 
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during hand computations. The best strategy is to use only the reduced echelon form 
to solve a system! The Study Guide that accompanies this text offers several helpful 
suggestions for performing row operations accurately and rapidly. 

i— NUMERICAL NOTE - 

In general, the forward phase of row reduction takes much longer than the 
backward phase. An algorithm for solving a system is usually measured in flops 
(or floating point operations). A flop is one arithmetic operation (+,—,*，/) 
on two real floating point numbers . 3 For an « x (w + 1) matrix, the reduction 
to echelon form can take 2n 3 /3 + n 1 12 — In/6 flops (which is approximately 
2n 3 /3 flops when n is moderately large— say, n > 30). In contrast, further 
reduction to reduced echelon form needs at most n 2 flops. 


Existence and Uniqueness Questions 

Although a nonreduced echelon form is a poor tool for solving a system, this form is 
just the right device for answering two fundamental questions posed in Section 1.1. 

EXAMPLE 5 Determine the existence and uniqueness of the solutions to the system 

3x2 — 6^3 + 6 x 4 + 4%5 = —5 

3xi — 7x2 + 8 x 3 — 5 义 4 + 8 x 5 = 9 

3x\ — 9x2 + 12 又 3 — 9 x 4 + 6 又 5 = 15 

SOLUTION The augmented matrix of this system was row reduced in Example 3 to 

"3-9 12 -9 6 15" 

0 2 -4 4 2 -6 ( 8 ) 

0 0 0 0 1 4 

The basic variables are X\, X 2 , and X 5 ； the free variables are X 3 and X 4 . There is no 
equation such as 0 = 1 that would indicate an inconsistent system, so we could use 
back-substitution to find a solution. But the existence of a solution is already clear 
in ( 8 ). Also, the solution is not unique because there are free variables. Each different 
choice of X 3 and X 4 determines a different solution. Thus the system has infinitely many 
solutions. ■ 

When a system is in echelon form and contains no equation of the form 0 = b, with 
b nonzero, every nonzero equation contains a basic variable with a nonzero coefficient. 
Either the basic variables are completely determined (with no free variables) or at least 
one of the basic variables may be expressed in terms of one or more free variables. In 
the former case, there is a unique solution; in the latter case, there are infinitely many 
solutions (one for each choice of values for the free variables). 

These remarks justify the following theorem. 


3 Traditionally, 3. flop was only a multiplication or division, because addition and subtraction took much less 
time and could be ignored. The definition of flop given here is preferred now, as a result of advances in 
computer architecture. See Golub and Van Loan, Matrix Computations, 2nd ed. (Baltimore: The Johns 
Hopkins Press, 1989), pp. 19-20. 









1.2 Row Reduction and Echelon Forms 21 


THEOREM 2 Existence and Uniqueness Theorem 

A linear system is consistent if and only if the rightmost column of the augmented 
matrix is not a pivot column—that is, if and only if an echelon form of the 
augmented matrix has no row of the form 

[0 ••- 0 b] with b nonzero 

If a linear system is consistent, then the solution set contains either (i) a unique 
solution, when there are no free variables, or (ii) infinitely many solutions, when 
there is at least one free variable. 

The following procedure outlines how to find and describe all solutions of a linear 
system. 


USING ROW REDUCTION TO SOLVE A LINEAR SYSTEM 

1. Write the augmented matrix of the system. 

2. Use the row reduction algorithm to obtain an equivalent augmented matrix in 
echelon form. Decide whether the system is consistent. If there is no solution, 
stop; otherwise, go to the next step. 

3. Continue row reduction to obtain the reduced echelon form. 

4. Write the system of equations corresponding to the matrix obtained in step 3. 

5. Rewrite each nonzero equation from step 4 so that its one basic variable is 
expressed in terms of any free variables appearing in the equation. 


PRACTICE PROBLEMS 

1. Find the general solution of the linear system whose augmented matrix is 

"1-3-5 0" 

0 113 

2. Find the general solution of the system 

X\ — 2X2 — + 3^4 = 0 

—2x\ + 4^2 + 5^3 — 5^4 = 3 
3x\ — 6 x 2 — 6^3 + 8^4 = 2 


1.2 EXERCISES 

In Exercises 1 and 2, determine which matrices are in reduced 
echelon form and which others are only in echelon form. 












"1 

0 

0 

0" 


"1 

1 

0 

1 

1 " 

"1 

0 

0 

0" 


'1 

0 

1 

0" 

c. 

0 

1 

1 

0 

d. 

0 

2 

0 

2 

2 

0 

1 

0 

0 

b. 

0 

1 

1 

0 

0 

0 

0 

0 

0 

0 

0 

3 

3 

0 

0 

1 

1 


0 

0 

0 

1 


0 

0 

0 

1 


0 

0 

0 

0 

4 
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"0 

1 -2 

3' 

10 . 

'1 

-2 

-1 

4" 

1 

-3 4 

-6 

-2 

4 

-5 

6_ 


Row reduce the matrices m Exercises 3 and 4 to reduced echelon 
form. Circle the pivot positions in the final matrix and in the 
original matrix, and list the pivot columns. 


5. Describe the possible echelon forms of a nonzero 2x2 
matrix. Use the symbols ■, *, and 0, as in the first part of 
Example 1. 

6. Repeat Exercise 5 for a nonzero 3x2 matrix. 

Find the general solutions of the systems whose augmented ma¬ 
trices are given in Exercises 7-14. 


7 . 


9 . 


11 . 


13 . 


14 . 


Exercises 15 and 16 use the notation of Example 1 for matrices 
in echelon form. Suppose each matrix represents the augmented 
matrix for a system of linear equations. In each case, determine if 
the system is consistent. If the system is consistent, determine if 
the solution is unique. 

■ 氺 氺氺 

15 . a. 0 ■ * * 


1 0 -9 0 4' 

0130-1 
0 0 0 1 -7 

0 0 0 0 1 


1 -3 
0 1 
0 0 
0 0 


0 -1 
0 0 

0 1 

0 0 


b. 


0 ■氺 氺氺 
0 0 ■氺氺 
0 0 0 -0 


In Exercises 19 and 20, choose h and A ： such that the system has (a) 
no solution, (b) a unique solution, and (c) many solutions. Give 
separate answers for each part. 


19 . X\ + hx 2 = 2 
4xi + 8x2 — k 


20 . X\ — 3^2 

2 x\ + hx 2 


In Exercises 21 and 22, mark each statement True or False. Justify 

each answer. 4 

21. a. In some cases, a matrix may be row reduced to more 

than one matrix in reduced echelon form, using different 
sequences of row operations. 

b. The row reduction algorithm applies only to augmented 
matrices for a linear system. 

c. A basic variable in a linear system is a variable that 
corresponds to a pivot column in the coefficient matrix. 

d. Finding a parametric description of the solution set of a 
linear system is the same as solving the system. 

e. If one row in an echelon form of an augmented matrix 
is [0 0 0 5 0 ], then the associated linear system is 
inconsistent. 

22 . a. The reduced echelon form of a matrix is unique. 

b. If every column of an augmented matrix contains a pivot, 
then the corresponding system is consistent. 

c. The pivot positions in a matrix depend on whether row 
interchanges are used in the row reduction process. 

d. A general solution of a system is an explicit description 
of all solutions of the system. 

e. Whenever a system has free variables, the solution set 
contains many solutions. 

23 . Suppose the coefficient matrix of a linear system of four 
equations in four variables has a pivot in each column. Ex¬ 
plain why the system has a unique solution. 

24 . Suppose a system of linear equations has a 3 x 5 augmented 
matrix whose fifth column is not a pivot column. Is the 
system consistent? Why (or why not)? 


4 True/false questions of this type will appear in many sections. Methods 
for justifying your answers were described before Exercises 23 and 24 in 
Section 1.1. 


■ 氺 氺氺 
0 ■氺氺 
0 0 0 0 


1 0-5 0-8 3 
0 14-106 
0 0 0 0 1 0 
0 0 0 0 0 0 


3-240 

9 -6 12 0 12 . 

6-480 


2. a. 


10 11 
0 111 
0 0 0 0 

0 0 0 0 ' 

12 0 0 
0 0 10 
0 0 0 1 


b. 


1 

0 

0 

0 


■ 

氺 

氺 



0 

2 

0 

0 

16 . a. 

0 

■ 

氺 



0 

0 

1 

1 


_0 

0 

■ 








_ ■ 

氺 

氺 

氺 

氺 





b. 

0 

0 

■ 

氺 

氺 






0 

0 

0 

■ 

氺 


In Exercises 17 and 18, determine the value(s) of h such that the 
matrix is the augmented matrix of a consistent linear system. 


3 . 


1 

2 

4 

8 


1 

2 

4 

5 

2 

4 

6 

8 

4 . 

2 

4 

5 

4 

3 

6 

9 

12 


4 

5 

4 

2 


8 . 


4 焱 


17 


d. 


1 

3 

4 

7 

8. 

1 

-3 

0 

-5 

3 

9 

7 

6 

-3 

7 

0 

9 
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25. Suppose the coefficient matrix of a system of linear equations 
has a pivot position in every row. Explain why the system is 
consistent. 

26. Suppose a 3 x 5 coefficient matrix for a system has three 
pivot columns. Is the system consistent? Why or why not? 

27. Restate the last sentence in Theorem 2 using the concept of 

pivot columns: “If a linear system is consistent, then the 
solution is unique if and only if_’’ 

28. What would you have to know about the pivot columns in an 
augmented matrix in order to know that the linear system is 
consistent and has a unique solution? 

29. A system of linear equations with fewer equations than un¬ 
knowns is sometimes called an under determined system. Can 
such a system have a unique solution? Explain. 

30. Give an example of an inconsistent underdetermined system 
of two equations in three unknowns. 

31. A system of linear equations with more equations than un¬ 
knowns is sometimes called an overdetermined system. Can 
such a system be consistent? Illustrate your answer with a 
specific system of three equations in two unknowns. 

32. Suppose an /I x (« + 1) matrix is row reduced to reduced 
echelon form. Approximately what fraction of the total 
number of operations (flops) is involved in the backward 
phase of the reduction when n = 20? when n = 200? 

Suppose experimental data are represented by a set of points in the 

plane. An interpolating polynomial for the data is a polynomial 

whose graph passes through every point. In scientific work, 


such a polynomial can be used, for example, to estimate values 
between the known data points. Another use is to create curves for 
graphical images on a computer screen. One method for finding an 
interpolating polynomial is to solve a system of linear equations. 

WEB 

33. Find the interpolating polynomial p(t) = + ciit 1 

for the data (1,6), (2,15), (3,28). That is, find a 。， ai, and 
ci 2 such that 

a 0 + ai(l) + “ 之 ⑴ 2 = 6 

a 0 + fli(2) + a 2 (2) 2 = 15 
a。+ a ! ⑶ + fl2(3) 2 = 28 

34. [M] In a wind tunnel experiment, the force on a projectile 

due to air resistance was measured at different velocities: 
Velocity (100 ft/sec) 0 2 4 6 8 10 

Force (100 lb) 0 2.90 14.8 39.6 74.3 119 

Find an interpolating polynomial for these data and estimate 
the force on the projectile when the projectile is traveling 
at 750 ft/sec. Use p(t) = ao + ait + a 2 t 2 + cist 3 + + 

ast 5 . What happens if you try to use a polynomial of degree 
less than 5? (Try a cubic polynomial, for instance.) 5 


5 Exercises marked with the symbol [M] are designed to be worked with 
the aid of a “Matrix program’，（a computer program, such as 
MATLAB®, Maple™, Mathematical, MathCad®, or Derive™, or a 
programmable calculator with matrix capabilities, such as those 
manufactured by Texas Instruments or Hewlett-Packard). 


SOLUTIONS TO PRACTICE PROBLEMS 



The general solution of the 
system of equations is the line of 
intersection of the two planes. 


1. The reduced echelon form of the augmented matrix and the corresponding system 
are 


10-29 
0 113 


and 


X\ — 2^3 = 9 
x 2 + X 3 = 3 


The basic variables are X\ and X 2 , and the general solution is 


xi = 9 + 2 x 3 
x 2 = 3-x 3 
X 3 is free 


Note: It is essential that the general solution describe each variable, with any param¬ 
eters clearly identified. The following statement does not describe the solution: 


x\ = 9 2 x 3 

x 2 = 3- X 3 

X3 = 3 — X2 Incorrect solution 


This description implies that X 2 and X 3 are both free, which certainly is not the case. 
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2. Row reduce the system’s augmented matrix: 


1-2-1 3 

-2 4 5 -5 

3-6-6 8 


1 -2 
0 0 

0 0 

1-2 
0 0 

0 0 


-1 

3 

-3 

-1 

3 

0 


3 

1 

-1 

3 

1 

0 


0 

3 

2 

0 

3 

5 


This echelon matrix shows that the system is inconsistent, because its rightmost 
column is a pivot column; the third row corresponds to the equation 0 = 5. There 
is no need to perform any more row operations. Note that the presence of the free 
variables in this problem is irrelevant because the system is inconsistent. 


1.3 VECTOR EQUATIONS 

Important properties of linear systems can be described with the concept and notation 
of vectors. This section connects equations involving vectors to ordinary systems of 
equations. The term vector appears in a variety of mathematical and physical contexts, 
which we will discuss in Chapter 4, “Vector Spaces.” Until then, vector will mean an 
ordered list of numbers. This simple idea enables us to get to interesting and important 
applications as quickly as possible. 


Vectors in R 2 


A matrix with only one column is called a column vector, or simply a vector. Examples 
of vectors with two entries are 


-1 


w 


W\ 

W 2 


where w\ and W 2 are any real numbers. The set of all vectors with two entries is denoted 
by R 2 (read “r-two”). The R stands for the real numbers that appear as entries in the 
vectors, and the exponent 2 indicates that each vector contains two entries. 1 

Two vectors in R 2 are equal if and only if their corresponding entries are equal. 

' 41 r 7 ~| 

i are not equal, because vectors in R 2 are ordered pairs of real 


Thus 


and 


4 


numbers. 

Given two vectors u and y in M 2 , their sum is the vector u + v obtained by adding 
corresponding entries of u and v. For example, 


r 

i 

'2" 


1+2' 


"3" 

-2 

十 

_5_ 


_-2 + 5_ 


_3_ 


Given a vector u and a real number c, the scalar multiple of u by c is the vector cu 
obtained by multiplying each entry in u by c. For instance, 


if 



3' 


3" 


"15" 

u = 

-1 

and c = 5, then cu = 5 

-1 

= 

-5 


1 Most of the text concerns vectors and matrices that have only real entries. However, all definitions and 
theorems in Chapters 1-5, and in most of the rest of the text, remain valid if the entries are complex 
numbers. Complex vectors and matrices arise naturally, for example, in electrical engineering and physics. 
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The number c in cu is called a scalar; it is written in lightface type to distinguish it from 
the boldface vector u. 

The operations of scalar multiplication and vector addition can be combined, as in 
the following example. 


EXAMPLE 1 Given u 

SOLUTION 


and v 


(-3)v 


,find 4u, (—3)y, and 4u + (— 3)y. 


-6 

15 


and 


4u + (—3)v 


—6 

15 


■ 


Sometimes, for convenience (and also to save space), this text may write a column 


vector such as 


in the form (3, —1). In this case, the parentheses and the comma 


distinguish the vector (3,-1) from the 1x2 row matrix [ 3 —1 ], written with brackets 
and no comma. Thus 


-1 


冲 —I] 


because the matrices have different shapes, even though they have the same entries. 


Geometric Descriptions of R 2 


Consider a rectangular coordinate system in the plane. Because each point in the plane 
is determined by an ordered pair of numbers, we can identify a geometric point (a ， b) 


with the column vector 


So we may regard M 2 as the set of all points in the plane. 


See Fig. 1. 


X 2 X 1 



•(2, 2) 

.•(2, 2) 


x \ 


(-2,-1) 

* 3 ,- 1 ) (-2,-1) 



FIGURE 1 Vectors as points. FIGURE 2 Vectors with arrows. 


The geometric visualization of a vector such as is often aided by including 

an arrow (directed line segment) from the origin (0,0) to the point (3, —1), as in Fig. 2. 
In this case, the individual points along the arrow itself have no special significance. 2 

The sum of two vectors has a useful geometric representation. The following rule 
can be verified by analytic geometry. 


2 In physics, arrows can represent forces and usually are free to move about in space. This interpretation of 
vectors will be discussed in Section 4.1. 
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Parallelogram Rule for Addition 

If u and v in R 2 are represented as points in the plane, then u + v corresponds to 
the fourth vertex of the parallelogram whose other vertices are u, 0, and y. See 
Fig. 3. 


x i 



FIGURE 3 The parallelogram rule. 


EXAMPLE 2 

in Fig. 4. 


The vectors u = 


2 

2 


,and u + v 



are displayed 


■ 





FIGURE 4 


The next example illustrates the fact that the set of all scalar multiples of one fixed 
nonzero vector is a line through the origin, (0,0). 


EXAMPLE 3 Letu 


.Display the vectors u, 2u, and - 


[a graph. 


SOLUTION See Fig. 5, where u, 2u 


,and — 


-2 

2/3 


are displayed. The 


arrow for 2u is twice as long as the arrow for u, and the arrows point in the same 
direction. The arrow for — |u is two-thirds the length of the arrow for u, and the arrows 
point in opposite directions. In general, the length of the arrow for cu is |c| times the 



Typical multiples of u 


The set of all multiples of u 


FIGURE 5 
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义 3 



FIGURE 6 

Scalar multiples . 


length of the arrow for u. [Recall that the length of the line segment from (0,0) to (a, b) 
is y/a 2 + b 2 . We shall discuss this further in Chapter 6.] ■ 


Vectors in R 3 


Vectors in R 3 are 3x1 column matrices with three entries. They are represented geo¬ 


metrically by points in a three-dimensional coordinate space, with arrows from the ori- 

" 2 " 


gin sometimes included for visual clarity. The vectors a = 
in Fig. 6. 


3 and 2a are displayed 

4 


Vectors in 

If « is a positive integer, W l (read “r-n”）denotes the collection of all lists (or ordered 
n-tuples) of n real numbers, usually written as « x 1 column matrices, such as 


U\ 

u 2 

u = 

_ _ 

The vector whose entries are all zero is called the zero vector and is denoted by 0. 
(The number of entries in 0 will be clear from the context.) 

Equality of vectors in W 1 and the operations of scalar multiplication and vector 
addition in are defined entry by entry just as in M 2 . These operations on vectors 
have the following properties, which can be verified directly from the corresponding 
properties for real numbers. See Practice Problem 1 and Exercises 33 and 34 at the end 
of this section. 


x i 



FIGURE 7 

Vector subtraction. 


Algebraic Properties of R n 

For all u, v, w in W 1 and all scalars c and d\ 


(i) u + v = y + u 

(ii) (u + v) + w = u + (v + w) 

(iii) u + 0 = 0 + u = u 

(iv) u + (—u) = —u + u = 0, 
where —u denotes (—l)u 


(v) c(u + y) = cu + cv 

(vi) (c + d)u = cu-\- du 

(vii) c(du) = (cd)(u) 
(viii) lu = u 


For simplicity of notation, a vector such as u + (—l)v is often written as u — y. 
Figure 7 shows u — y as the sum of u and —y. 

Linear Combinations 

Given vectors Vi, V2 ,..., in R n and given scalars Ci, C2 ,..., c p , the vector y defined 
by 

y = ^ivi h — + c p \ p 

is called a linear combination of Vi,... ,y^ with weights c\,... ,c p . Property (ii) 
above permits us to omit parentheses when forming such a linear combination. The 
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weights in a linear combination can be any real numbers, including zero. For example, 
some linear combinations of vectors Vi and \2 are 

V3 Vi + v 2 , \y\ (= \\\ + 0v 2 ), and 0 (= Ovi + 0v 2 ) 


EXAMPLE 4 Figure 8 identifies selected linear combinations of Vi 
_ 2 _ 


and 


v 2 


.(Note that sets of parallel grid lines are drawn through integer multiples of 


Vi and V2.) Estimate the linear combinations of Vi and \2 that generate the vectors u 
and w. 




FIGURE 9 


SOLUTION The parallelogram rule shows that u is the sum of 3vi and —2v2 ； that is, 


u = 3vi — 2v 2 

This expression for u can be interpreted as instructions for traveling from the origin 
to u along two straight paths. First, travel 3 units in the Vi direction to 3vi, and then 
travel —2 units in the \2 direction (parallel to the line through \2 and 0). Next, although 
the vector w is not on a grid line, w appears to be about halfway between two pairs of 
grid lines, at the vertex of a parallelogram determined by (5/2)vi and (—1/2 )v 2. (See 
Fig. 9.) Thus a reasonable estimate for w is 

W = |vi — \\2 ■ 

The next example connects a problem about linear combinations to the fundamental 
existence question studied in Sections 1.1 and 1.2. 


EXAMPLE 5 Letai = 

_ r 
-2 

,a 2 = 

"2" 

5 

,and b = 

7" 

4 


-5 


6 


-3 


.Determine whether 


b can be generated (or written) as a linear combination of a i and a〗. That is, determine 
whether weights x\ and X 2 exist such that 

x\a\ + x 2 a 2 = b (1) 


If vector equation (1) has a solution, find it. 

SOLUTION Use the definitions of scalar multiplication and vector addition to rewrite 
the vector equation 



1 


2 


7 

A 

-2 

+ X2 

5 

= 

4 


-5 


6 


-3 


ai a 2 b 
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which is the same as 


Xl 


2x 2 


7 

— 2.x \ 

+ 

5x 2 

= 

4 

— 5xi 


6x 2 


-3 


and 

X\ + 2X2 

— 2xi H - 5x2 
— -J- 6%2 

The vectors on the left and right sides of (2) are equal if and only if their corresponding 
entries are both equal. That is, Xi and X 2 make the vector equation (1) true if and only 
if x\ and X2 satisfy the system 

X\ + 2X2 = 7 

—2x\ + 5^2 = 4 (3) 

— + 6^2 = _ 3 

To solve this system, row reduce the augmented matrix of the system as follows: 3 


1 

2 

7" 


"1 

2 

7" 


"1 

2 

7" 


"1 

0 

3" 

-2 

5 

4 

〜 

0 

9 

18 

〜 

0 

1 

2 

〜 

0 

1 

2 

-5 

6 

-3 


0 

16 

32 


0 

16 

32 


0 

0 

0 


The solution of (3) is X\ =3 and X 2 = 2. Hence b is a linear combination of a 1 and a 〗， 
with weights X\ = 3 and X 2 = 2. That is, 


1 


2 


7 

-2 

+ 2 

5 

= 

4 

-5 


6 


-3 


Observe in Example 5 that the original vectors ai, a2, and b are the columns of the 
augmented matrix that we row reduced: 



ai a 2 b 

For brevity, write this matrix in a way that identifies its columns — namely, 

[ai a 2 b] (4) 

It is clear how to write this augmented matrix immediately from vector equation (1), 
without going through the intermediate steps of Example 5. Take the vectors in the 
order in which they appear in (1) and put them into the columns of a matrix as in (4). 
The discussion above is easily modified to establish the following fundamental fact. 

A vector equation 

x\Sii + x 2 a 2 H - h x n a n = b 

has the same solution set as the linear system whose augmented matrix is 

[ai a 2 … a„ b] (5) 

In particular, b can be generated by a linear combination of ai, … ， a„ if and only 
if there exists a solution to the linear system corresponding to the matrix (5). 


4 3 


2 5 
I I 


3 The symbol 〜 between matrices denotes row equivalence (Section 1.2). 
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DEFINITION 


One of the key ideas in linear algebra is to study the set of all vectors that can be 
generated or written as a linear combination of a fixed set {vi,..., v^} of vectors. 

If Vi,. • •, Vp are in R 72 , then the set of all linear combinations of Vi,..., 
is denoted by Span{vi,... and is called the subset of R n spanned (or 
generated) by Vi, ... ,\ p . That is, Span{vi, •. • ， Vp} is the collection of all 
vectors that can be written in the form 

C\\l + C 2 \2 H - h c p \ p 

with c\,... ,c p scalars. 


Asking whether a vector b is in Span{vi,...,y^} amounts to asking whether the 
vector equation 


^lVl + X 2 \2 H - h Xp\ p = b 

has a solution, or, equivalently, asking whether the linear system with augmented matrix 
[vi … \ p b ] has a solution. 

Note that Span {vi,..., y^} contains every scalar multiple of Vi (for exam¬ 
ple), since c\\ = c\\ + 0v2 + ••• + Ov^. In particular, the zero vector must be in 
Spanjv!,...^^}. 


A Geometric Description of Span{v} and Span{u, v} 

Let v be a nonzero vector in R 3 . Then Span {y} is the set of all scalar multiples of v, 
which is the set of points on the line in R 3 through y and 0. See Fig. 10. 

If u and y are nonzero vectors in R 3 , with y not a multiple of u, then Span {u, v} is 
the plane in R 3 that contains u, y, and 0. In particular, Span {u, v} contains the line in 
R 3 through u and 0 and the line through v and 0 . See Fig. 11. 



FIGURE 10 Span {y} as a line 
through the origin. 


X 3 



FIGURE 11 Span {u, y} as a 
plane through the origin. 


EXAMPLE 6 Let ai = 

r 

-2 

,a 2 = 

5" 

-13 

,and b = 

"-3~ 

8 


3 


-3 


1 


Span {ai, a2) is a plane through the origin in R 3 . Is b in that plane? 
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SOLUTION Does the equation x\SL\ + X 2 SI 2 = t> have a solution? To answer this, row 
reduce the augmented matrix [ai a 2 b ]: 


一 1 

5 

-3" 


"1 

5 

-3" 


"1 

5 

-3" 

-2 

-13 

8 

〜 

0 

-3 

2 

〜 

0 

-3 

2 

3 

-3 

1 


0 

-18 

10 


0 

0 

-2 


The third equation is 0 = —2, which shows that the system has no solution. The vector 
equation Xiai + X 2 a 2 = b has no solution, and so b is not in Span {a!, ^ 2 }. ■ 

Linear Combinations in Applications 

The final example shows how scalar multiples and linear combinations can arise when 
a quantity such as “cost” is broken down into several categories. The basic principle for 
the example concerns the cost of producing several units of an item when the cost per 
unit is known: 

{ number) ( cost ) _ (total) 

of units j (per unit) ( cost j 

EXAMPLE 7 A company manufactures two products. For $1.00 worth of product 
B, the company spends $.45 on materials, $.25 on labor, and $.15 on overhead. For 
$1.00 worth of product C, the company spends $.40 on materials, $.30 on labor, and 
$.15 on overhead. Let 



".45" 


".40" 

b = 

.25 

and c = 

.30 


• 15 


.15 


Then b and c represent the “costs per dollar of income” for the two products. 

a. What economic interpretation can be given to the vector 100b? 

b. Suppose the company wishes to manufacture X\ dollars worth of product B and 
X 2 dollars worth of product C. Give a vector that describes the various costs the 
company will have (for materials, labor, and overhead). 

SOLUTION 

a. Compute 



".45" 


"45" 

100 b = 100 

.25 

= 

25 


.15 


15 


The vector 100b lists the various costs for producing $100 worth of product 
B—namely, $45 for materials, $25 for labor, and $15 for overhead, 
b. The costs of manufacturing X\ dollars worth of B are given by the vector xib, and 
the costs of manufacturing X 2 dollars worth of C are given by X 2 C. Hence the total 
costs for both products are given by the vector x\b -h X 2 C. ■ 


PRACTICE PROBLEMS 

1. Prove that u + v = v + u for any u and v in . 

2. For what value(s) of h will y be in Span{vi, V 2 , V 3 } if 



r 


5" 


"-3" 


"-4" 

Vl = 

-1 

-2 

, V 2 = 

-4 

-7 

, V 3 = 

1 

0 

,and y = 

3 

h 
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1.3 EXERCISES 

In Exercises 1 and 2, compute u + v and u — 2y. 


In Exercises 13 and 14, determine if b is a linear combination of 
the vectors formed from the columns of the matrix A. 




1 

-4 

2" 


3" 

13. 

A = 

0 

3 

5 

,b = 

-7 



_-2 

8 

-4_ 


_-3_ 



1 

0 

5' 


2" 

14. 

A = 

-2 

1 

-6 

,b = 

-1 



0 

2 

8 


6 


15. Letai = 

r 

3 

,a 2 = 

"-5" 

-8 

,and b = 

3" 

-5 


-1 


2 


h 


• For what 


value(s) of /z is b in the plane spanned by ai and a 2 ? 


16. Let V!= 

_ r 
0 

,V 2 = 

"-2" 

1 

,and y = 

h~ 

-3 


-2 


7 


-5 


• For what 


value(s) of /z is y in the plane generated by Vi and V 2 ? 

In Exercises 17 and 18, list five vectors in Span {vi, V 2 }. For each 
vector, show the weights on Vi and V 2 used to generate the vector 
and list the three entries of the vector. Do not make a sketch. 



In Exercises 3 and 4, display the following vectors using arrows 
on an xy-graph: u, v, — y, — 2v, u + y, u — v, and u — 2v. Notice 
that u — y is the vertex of a parallelogram whose other vertices are 
u, 0, and —y. 


3. u and y as in Exercise 1 


4. u and y as in Exercise 2 


In Exercises 5 and 6, write a system of equations that is equivalent 
to the given vector equation. 



3 


5 


2 

5. Xi 

-2 

+ X 2 

0 

= 

-3 


8 


-9 


8 



3 


7 


-2 


0 

又 1 

-2 

+ 

3 

+ X 3 

1 

— 

0 


Use the accompanying figure to write each vector listed in Exer¬ 
cises 7 and 8 as a linear combination of u and y. Is every vector 
in M 2 a linear combination of u and y? 



8. Vectors w, x, y, and z 

In Exercises 9 and 10, write a vector equation that is equivalent to 
the given system of equations. 


9. X 2 + 5xs = 0 

4xi + 6X2 — X3 = 0 
— X\ -f- 3%2 _ 8x3 — 0 


3^i — 2x2 + 4x3 — 3 
~2,X\ _ 1X2 "f - 5x3 = 1 

-\- 4^2 — 3 又 3 = 2 


In Exercises 11 and 12, determine if b is a linear combination of 
ai, a 2 , and a 3 . 


11. ai 


1 


0 


5 


2 

-2 

, 狂 2 = 

1 

,a 3 = 

-6 

,b = 

-1 

0 


2 


8 


6 



1 


-2 


-6 


11 

12 . ai = 

0 

,a 2 = 

3 

,a 3 = 

7 

,b = 

-5 


1 


-2 


5 


9 



1 


-2 

18. Vi = 

1 

-2 

,v 2 = 

3 

0 


19. Give a geometric description of Span {vi ， V 2 } for the vectors 



8" 


"12" 

Vl = 

2 

-6 

and V 2 = 

3 

-9 


20. Give a geometric description of Span {vi ， V 2 } for the vectors 
in Exercise 18. 


21. Let u 


and y : 


Show that 


Span {u, v} for all h and k. 


22. Construct a 3 x 3 matrix A, with nonzero entries, and a vector 
b in E 3 such that b is not in the set spanned by the columns 
of A. 

In Exercises 23 and 24, mark each statement True or False. Justify 
each answer. 


23. a. Another notation for the vector 


-4 


is [ —4 3 ]. 


b. The points in the plane corresponding to 
lie on a line through the origin. 


and 


c. An example of a linear combination of vectors Vi and \2 
is the vector ^Vi. 
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d. The solution set of the linear system whose augmented 

matrix is [ ai a 2 b ] is the same as the solution 
set of the equation x\ai + + X 3 a^ = b. 

e. The set Span {u, y}is always visualized as a plane through 
the origin. 

24. a. When u and y are nonzero vectors. Span {u, v} contains 
only the line through u and the origin, and the line through 
y and the origin. 

b. Any list of five real numbers is a vector in R 5 . 

c. Asking whether the linear system corresponding to 
an augmented matrix [ ai a 2 b ] has a solution 
amounts to asking whether b is in Span {ai, st 2 , a〗}. 

d. The vector y results when a vector u — y is added to the 
vector y. 


e. The weights c\,... ,c p in a linear combination 
C 1 V 1 + ••• + c p \p cannot all be zero. 



1 0 -4 ■ 


4" 


25. Let A = 

0 3-2 

-2 6 3 

and b = 

1 

-4 

.Denote the 


columns of ^4 by ai, a 2 , st 3 , and let W = Span {ai, a 2 , a〗}. 

a. Is b in {ai, a 2 , a〗}? How many vectors are in {ai, a 2 , a〗}? 

b. Is b in ? How many vectors are mWl 

c. Show that a 1 is in W. [Hint: Row operations are unnec¬ 
essary.] 


26. Let A = 

" 2 0 6" 
-18 5 

,let b = 

"10" 
3 

,and let W be 


1 -2 1 


7 



the set of all linear combinations of the columns of A. 

a. Is b in W1 

b. Show that the second column of A is in W. 


27. A mining company has two mines. One day’s operation 
at mine #1 produces ore that contains 30 metric tons of 
copper and 600 kilograms of silver, while one day’s operation 
at mine #2 produces ore that contains 40 metric tons of 

. . 「 301 

copper and 380 kilograms of silver. Let Vi = and 

" 401 

\2 = OOA . Then Vi and \2 represent the “output per day” 

3oU 

of mine #1 and mine #2, respectively. 

a. What physical interpretation can be given to the vector 
5vi? 

b. Suppose the company operates mine #1 for X\ days and 
mine #2 for X 2 days. Write a vector equation whose 
solution gives the number of days each mine should 
operate in order to produce 240 tons of copper and 2824 
kilograms of silver. Do not solve the equation. 

c. [M] Solve the equation in (b). 

28. A steam plant bums two types of coal: anthracite (A) and 
bituminous (B). For each ton of A burned, the plant produces 
27.6 million Btu of heat, 3100 grams (g) of sulfur dioxide, 
and 250 g of particulate matter (solid-particle pollutants). For 


each ton of B burned, the plant produces 30.2 million Btu, 
6400 g of sulfur dioxide, and 360 g of particulate matter. 

a. How much heat does the steam plant produce when it 
burns X\ tons of A and X2 tons of B? 

b. Suppose the output of the steam plant is described by 
a vector that lists the amounts of heat, sulfur dioxide, 
and particulate matter. Express this output as a linear 
combination of two vectors, assuming that the plant bums 
X\ tons of A and X2 tons of B. 

c. [M] Over a certain time period, the steam plant produced 
162 million Btu of heat, 23,610 g of sulfur dioxide, and 
1623 g of particulate matter. Determine how many tons 
of each type of coal the steam plant must have burned. 
Include a vector equation as part of your solution. 

29. Let . ,\k be points in R 3 and suppose that for 
j = l,... ,k an object with mass is located at point Vy. 
Physicists call such objects point masses. The total mass of 
the system of point masses is 

m = m\ rrik 

The center of gravity (or center of mass) of the system is 

v = H - h mk\k] 

m 

Compute the center of gravity of the system consisting of the 
following point masses (see the figure): 


Point 

Mass 

vi = (2,-2,4) 

4g 

v 2 = (-4,2,3) 

2g 

v 3 = (4,0,-2) 

3g 

v 4 = (1,-6,0) 

5g 


x 3 



30. Let y be the center of mass of a system of point 
masses located at Vi,..., as in Exercise 29. Is y in 
Span {vi, …， ％}? Explain. 
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31 . A thin triangular plate of uniform density and thickness has 
vertices at Vi = (0,1), V2 = (8,1), and V3 = (2,4), as in the 
figure below, and the mass of the plate is 3 g. 


V 3 

4 -- 

__ Metal Plate 

V 卜 - 

- 1~~I~~I~~I~~I~~I~~I~~I — ^1 

8 

a. Find the (x ， 3 ;)-coordinates of the center of mass of the 
plate. This “balance point” of the plate coincides with 
the center of mass of a system consisting of three 1 -gram 
point masses located at the vertices of the plate. 

b. Determine how to distribute an additional mass of 6 g 
at the three vertices of the plate to move the balance 
point of the plate to (2,2). [Hint: Let w\, W 2 , and W 3 
denote the masses added at the three vertices, so that 

Wi w 2 W 3 = 6 .] 

32. Consider the vectors Vi, \ 2 , V 3 , and b in M 2 , shown in the 
figure. Does the equation X\\i + X 2\2 + 又 3 V 3 = b have a 


solution? Is the solution unique? Use the figure to explain 
your answers. 

^2 


义 1 



33 . Use the vectors u = (u\,, m„), v = (i ； i,..., v n ), and 
w = (w ； i,..., w n ) to verify the following algebraic proper¬ 
ties of R n . 

a. (u + v) + w = u + (v + w) 

b. c(u + y) = cu + c\ for each scalar c 

34 . Use the vector u = (ui,..., u n ) to verify the following alge¬ 
braic properties of R n . 

a. u + (—u) = (—u) + u = 0 

b. c(du) = (cd)u for all scalars c and d 


SOLUTIONS TO PRACTICE PROBLEMS 



-4 


The points 3 lie on a line 


■ h_ 

that intersects the plane when 


h = 5. 


1. Take arbitrary vectors u = (wi,..., u n ) and y = (v\,..., v n ) in W 1 , and compute 

u + y = {u\ V\,... ,u n + v n ) Definition of vector addition 

= (v\ U\,..., v n + u n ) Commutativity of addition in R 

=V + u Definition of vector addition 

2. The vector y belongs to Span {vi, V 2 , V 3 } if and only if there exist scalars X\,X 2 , X 3 
such that 


1 


5 


-3 


-4 

-1 

+ X 2 

-4 

+ X 3 

1 

= 

3 

-2 


-7 


0 


h 


This vector equation is equivalent to a system of three linear equations in three 
unknowns. If you row reduce the augmented matrix for this system, you find that 


1 5 

-3 

-4" 


"1 

5 

-3 

-4 


一 1 5-3 -4 

-1 -4 

1 

3 

〜 

0 

1 

-2 

〜 

01-2 -1 

-2-7 

0 

h 


0 

3 

-6 

h-S 


000 h-5 


The system is consistent if and only if there is no pivot in the fourth column. That 
is, h — 5 must be 0. So y is in Span {vi, V 2 , V 3 } if and only if h = 5. 

Remember: The presence of a free variable in a system does not guarantee that the 
system is consistent. 


1.4 THE MATRIX EQUATION >Ax = b 

A fundamental idea in linear algebra is to view a linear combination of vectors as the 
product of a matrix and a vector. The following definition permits us to rephrase some 
of the concepts of Section 1.3 in new ways. 
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DEFINITION 


If ^4 is an m x /2 matrix, with columns ai, ..., a„, and if x is in R w , then the 

product of A and x, denoted by Ax, is the linear combination of the columns 
of A using the corresponding entries in x as weights; that is, 


Ax=[^i a 2 ••- a ;l ] 




X n 


Xiai + X 2 a 2 H - h x n 2 i n 


Note that Xx is defined only if the number of columns of A equals the number of entries 
in x. 


EXAMPLE 1 



"4" 







1 2-1 

= 4 

1 

+ 3 

2 

+ 7 


0-5 3_ 

3 

7 

0 

-5 

3 


4 


6 


-7 


0 

+ 

-15 

+ 

21 

= 


b. 

2-3 

8 0 

"4" 

7 

= 4 

2 

8 

+ 7 

-3 

0 

_ 

8 

32 

+ 

-21 

0 

_ 

-13 

32 


-5 2 



-5 


2 


-20 


14 


-6 


■ 


EXAMPLE 2 For Vi, y 2 , V 3 in R m , write the linear combination 3vi — 5\2 + 7 v 3 as 
a matrix times a vector. 

SOLUTION Place r ,V 2 , V 3 into the columns of a matrix A and place the weights 3, —5, 
and 7 into a vector x. That is, 

r 3" 

3 vi — 5 v2 + 7v3 = [vi \2 V3] —5 = Ax ■ 


Section 1.3 showed how to write a system of linear equations as a vector equation 
involving a linear combination of vectors. For example, the system 


X\ + 2^2 — ^3 = 4 

— 5xj 3^3 = 1 


⑴ 


is equivalent to 


x\ 


1 


2 

0 

+ X 2 

-5 


+ X3 



⑵ 


As in Example 2, the linear combination on the left side is a matrix times a vector, so 
that ( 2 ) becomes 


■1 2 - 1 " 

~^i~ 


"4" 

0-5 3 - 

X 2 

_^3_ 


1 


( 3 ) 


Equation (3) has the form Ax = b. Such an equation is called a matrix equation, 
to distinguish it from a vector equation such as is shown in ( 2 ). 

Notice how the matrix in (3) is just the matrix of coefficients of the system (1). 
Similar calculations show that any system of linear equations, or any vector equation 
such as (2), can be written as an equivalent matrix equation in the form Ax = b. This 
simple observation will be used repeatedly throughout the text. 

Here is the formal result. 
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THEOREM 3 


If ^ is an m x n matrix, with columns ai,..., a„, and if b is in R m , the matrix 
equation 

v4x = b (4) 

has the same solution set as the vector equation 

x\Si\ + x 2 a 2 H - h x n a n = b (5) 

which, in turn, has the same solution set as the system of linear equations whose 
augmented matrix is 

[ai a 2 •■- a„ b] ⑹ 


Theorem 3 provides a powerful tool for gaining insight into problems in linear 
algebra, because a system of linear equations may now be viewed in three different 
but equivalent ways: as a matrix equation, as a vector equation, or as a system of linear 
equations. Whenever you construct a mathematical model of a problem in real life, you 
are free to choose whichever viewpoint is most natural. Then you may switch from one 
formulation of a problem to another whenever it is convenient. In any case, the matrix 
equation (4)，the vector equation (5), and the system of equations are all solved in the 
same way—by row reducing the augmented matrix (6). Other methods of solution will 
be discussed later. 

Existence of Solutions 

The definition of Ax leads directly to the following useful fact. 


The equation Ax = b has a solution if and only if b is a linear combination of the 
columns of A. 


Section 1.3 considered the existence question, “Is b in Span {ai , …， a 72 }?’’ Equiv¬ 
alently, “Is Ax = b consistent?” A harder existence problem is to determine whether 
the equation Ax = b is consistent/or all possible b. 



1 

3 

4" 


b\ 


EXAMPLE 3 haA = 

-4 

2 

-6 

and b = 

bi 

.Is the equation Ax = b 


-3 

-2 

-7 


办 3 



consistent for all possible b\,b 2 , b^P. 


SOLUTION Row reduce the augmented matrix for Ax = b: 


1 3 

4 

b\ 


'1 

3 

4 

b\ 

-4 2 

-6 

bi 

〜 

0 

14 

10 

b 2 + 4Z?i 

-3-2 

-7 

h _ 


0 

7 

5 

+ 3b\ 




"1 

3 

4 




〜 

0 

14 

10 

办 2 




0 

0 

0 

b 3 + 3Z?i - 


b\ 

■f ^b\ 

- \ib 2 + 4Z?0 


The third entry in column 4 equals b\ — \b 2 + b^. The equation ^4x = b is not 
consistent for every b because some choices of b can make b\ — + nonzero. ■ 
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^3 



FIGURE 1 

The columns of 

A = [a\ SL 3 ] span a plane 
through 0. 


THEOREM 4 


The reduced matrix in Example 3 provides a description of all b for which the 
equation Ax = b is consistent: The entries in b must satisfy 

b\ — = 0 

This is the equation of a plane through the origin in R 3 . The plane is the set of all linear 
combinations of the three columns of A. See Fig. 1. 

The equation ^4x = b in Example 3 fails to be consistent for all b because the 
echelon form of A has a row of zeros. If A had a pivot in all three rows, we would 
not care about the calculations in the augmented column because in this case an echelon 
form of the augmented matrix could not have a row such as [ 0 0 0 1 ]. 

In the next theorem, the sentence “The columns of A span R w ” means that every b in 
W n is a linear combination of the columns of A. In general, a set of vectors {vi,..., y^} 
in R m spans (or generates) R m if every vector in is a linear combination of 
Vi,..., y^—that is, if Span{v 1 ,... ,v p } = M m . 


Let ^4 be an m x n matrix. Then the following statements are logically equivalent. 
That is, for a particular A, either they are all true statements or they are all false. 

a. For each b in W n , the equation Ax = b has a solution. 

b. Each b in is a linear combination of the columns of A. 

c. The columns of A span M. m . 

d. A has a pivot position in every row. 


Theorem 4 is one of the most useful theorems in this chapter. Statements (a), 
(b), and (c) are equivalent because of the definition of Ax and what it means for a 
set of vectors to span 'R m . The discussion after Example 3 suggests why (a) and (d) 
are equivalent; a proof is given at the end of the section. The exercises will provide 
examples of how Theorem 4 is used. 

Warning: Theorem 4 is about a coefficient matrix ，not an augmented matrix. If an 
augmented matrix [ A b ] has a pivot position in every row, then the equation ^4x = b 
may or may not be consistent. 


Computation of Ax 


The calculations in Example 1 were based on the definition of the product of a matrix A 
and a vector x. The following simple example will lead to a more efficient method for 
calculating the entries in Ax when working problems by hand. 


EXAMPLE 4 Compute Ax, where A = 

2 3 

-1 5 

4" 

-3 

and x = 

~ Xi~ 


6-2 

8 


x 3 
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SOLUTION From the definition, 


2 3 4 

又 1 


2 


3 


4 

-1 5-3 

6-2 8 

X 3 

=Xi 

-1 

6 

+ X2 

5 

-2 

+ X 3 

-3 

8 


2 x\ 

—Xl 

+ 

3X2 

5X2 

+ 

4^3 

—3X3 

(7) 

6 x\ 


—2x2 


8^3 



2xi + 3x2 + 4^3 
_ X\ 5x2 _ 3x3 
6x\ — 2x2 + 8x3 


The first entry in the product Ax is a sum of products (sometimes called a dot product), 
using the first row of A and the entries in x. That is, 


"2 3 4" 

"■^1 ~ 


2 x\ + 3 x 2 + 4 x 3 



= 



_^3_ 




This matrix shows how to compute the first entry in Ax directly, without writing down 
all the calculations shown in (7). Similarly, the second entry in Ax can be calculated at 
once by multiplying the entries in the second row of A by the corresponding entries in 
x and then summing the resulting products: 


-1 

5-3 

~ Xi " 

x 2 

— 

— X\ -|- 5X2 _ 3x3 



_^3_ 




Likewise, the third entry in Ax can be calculated from the third row of A and the entries 
in x. ■ 

Row-Vector Rule for Computing Ax 

If the product Ax is defined, then the zth entry in Ax is the sum of the products of 
corresponding entries from row i of A and from the vector x. 


EXAMPLE 5 


r 4" 

■2 


'1-4 + 2-3 + (-l)-7' 

■ 

L 7 _ 


_0.4 + (-5).3 + 3.7 - 

■ 


"2-3" 
8 0 

"4" 

7 

_ 

"2-4 + (-3) • 7" 
8.4 + 0.7 

_ 

-5 2 

/ 


(-5) .4 + 2.7 



-13 

32 

-6 



"1 

0 

0 一 

r 


l-r + Oi + O.f 


r 

c. 

0 

1 

0 

s 

= 

0 • r + l • s 0 -1 

= 

s 


0 

0 

1 

t 


0 .r+ 0 . 5 * + l.f 


t 


■ 


By definition, the matrix in Example 5(c) with l’s on the diagonal and 0’s elsewhere 
is called an identity matrix and is denoted by I. The calculation in part (c) shows that 
/x = x for every x in R 3 . There is an analogous n x n identity matrix, sometimes 
written as I n . As in part (c), I n x = x for every x in . 
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THEOREM 5 


Properties of the Matrix-Vector Product As. 

The facts in the next theorem are important and will be used throughout the text. The 
proof relies on the definition of Ax and the algebraic properties of W 1 . 

If A is mm x n matrix, u and y are vectors in M w , and c is a scalar, then: 

a. A(u + v) = ^4u + A\; 

b. A(cu) = c{A\x). 

PROOF For simplicity, take n = ?>, A = [ 2 i\ ^ a 3 ]，and u, y in R 3 . (The proof of 
the general case is similar.) For i = 1,2, 3, let w, and Vi be the zth entries in u and v ， 
respectively. To prove statement (a), compute ^4(u + v) as a linear combination of the 
columns of A using the entries in u + v as weights. 


^4(u + v) = [ ai 


a 2 a 3 ] 


U\ + V\ 
U 2 + V2 
U3 + ^3 


=(ui + fi)ai + (u 2 + U 2 )a 2 + (w 3 + u 3 )a 3 


Entries in u + y 


Columns of A 


=(wiai + w 2 a 2 + w 3 a 3 ) + (i；iai + v 2 2l 2 + 吵3 ) 
=An-\- Ay 


To prove statement (b), compute A(cu) as a linear combination of the columns of A 
using the entries in cu as weights. 


CU\ 

A(cu) = [ai a 2 a 3 ] cu 2 


= (cwi)ai + (cu 2 )si 2 + (cw 3 )a 3 


CU 3 


=C(U\SL\) + c(U2Si2) + ^(^3^3) 

=c(u\Si\ + w 2 a 2 + w 3 a 3 ) 

= c(Au) 


■ 


i— NUMERICAL NOTE - 

To optimize a computer algorithm to compute Ax, the sequence of calculations 
should involve data stored in contiguous memory locations. The most widely 
used professional algorithms for matrix computations are written in Fortran, a 
language that stores a matrix as a set of columns. Such algorithms compute Ax 
as a linear combination of the columns of A. In contrast, if a program is written in 
the popular language C, which stores matrices by rows, Ax should be computed 
via the alternative rule that uses the rows of A. 


PROOF OF THEOREM 4 As was pointed out after Theorem 4, statements (a) ， （ b), and 
(c) are logically equivalent. So, it suffices to show (for an arbitrary matrix A) that (a) 
and (d) are either both true or both false. That will tie all four statements together. 

Let U be an echelon form of A. Given b in W 71 , we can row reduce the augmented 
matrix [ A b ] to an augmented matrix [ U d ] for some d in R m : 

[A b] 〜.••〜 [t/ d] 
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equation as a vector equation, or vice versa. 

2 ' 


5. 


-4 


13. Let u : 


6 . 


2 一 3 


-21 

3 2 


'-3' 


1 

8 -5 


5 


-49 

-2 1 


11 



4 


-5 


7 


6 

7. Xi 

-1 

7 

+ X2 

3 

-5 

+ X 3 

-8 

0 

= 

-8 

0 


-4 


1 


2 


-7 



2 


-1 


-4 


0 


5 

Zi 

-4 

+ Z2 

5 

+ Z 3 

3 

+ Z4 

2 

_ 

12 


In Exercises 9 and 10, write the system first as a vector equation 
and then as a matrix equation. 


R 3 spanned by the columns of A1 (See the figure.) Why or 
why not? 


Plane spanned by 
the columns of A 



14. Letu : 


4" 


"2 

5 

- 1 " 

-1 

and ^4 = 

0 

1 

-1 

4 


1 

2 

0 


.Is u in the subset 


of M 3 spanned by the columns of A1 Why or why not? 


0 


3 

-5 

4 

and A = 

-2 

6 

4 


1 

1 


Is u in the plane in 


Compute the products in Exercises 1-4 using (a) the definition, as 
in Example 1, and (b) the row-vector rule for computing Ax. If a 
product is undefined, explain why. 

"-4 2 

1 . 1 6 

0 1 

"1 2 
3. -3 1 

1 6 

In Exercises 5-8, use the definition of Ax to write the matrix 



9 . 5x\ X 2 — 3^3 = 8 10 . 4xi — X 2 = 8 

2x2 + 4^3 = 0 5xi + 3x2 = 2 

3x\ — X 2 = I 

Given A and b in Exercises 11 and 12, write the augmented matrix 
for the linear system that corresponds to the matrix equation 
Ax = b. Then solve the system and write the solution as a vector. 




1 

3 

-4" 


"- 2 " 

11. 

A = 

1 

5 

2 

,b = 

4 



_-3 

-7 

6 _ 


_ 12 _ 



_ 1 

2 

-1 " 


_ 1" 

12. 

A = 

-3 

-4 

2 

,b = 

2 



5 

2 

3 


-3 


If statement (d) is true, then each row of U contains a pivot position and there can be 
no pivot in the augmented column. So Ax = b has a solution for any b, and (a) is true. 
If (d) is false, the last row of U is all zeros. Let d be any vector with a 1 in its last entry. 
Then [ U d ] represents an inconsistent system. Since row operations are reversible, 
[U d ] can be transformed into the form [ A b ]. The new system Ax = b is also 
inconsistent, and (a) is false. ■ 


PRACTICE PROBLEMS 


1. Let A 


1 

5 

-2 

0 " 


3" 

-2 

0 

-4 


"-7" 

-3 

1 

9 

-5 

， P = 

,and b = 

9 

4 

-8 

-1 

7 



0 


It can be shown 


that p is a solution of Ax = b. Use this fact to exhibit b as a specific linear 
combination of the columns of A. 


2. Let A 


5 


,u 


,and v 


-3 


.Verify Theorem 5(a) in this case 


by computing ^4(u + v) and Au + A\. 


1.4 EXERCISES 


3 1 
- 

2 3 
- 

- ―- 2 
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15. 


Let A = 




andb = 


b\ 

bi 


.Show that the equation 


Ax = b does not have a solution for all possible b, and 


describe the set of all b for which Ax = b does have a 
solution. 


16. Repeat the requests from Exercise 15 with 


1 

-2 

—r 


~ b\ ~ 

-2 

2 

0 

, and b = 

b 2 

4 

-1 

3 


_h _ 


Exercises 17-20 refer to the matrices A and B below. Make 
appropriate calculations that justify your answers and mention an 
appropriate theorem. 


A ： 


'1 

3 

0 

3" 


"1 

4 

1 

2" 

-1 

-1 

-1 

1 

B = 

0 

1 

3 

-4 

0 

-4 

2 

-8 

0 

2 

6 

7 

_ 2 

0 

3 

-1 


_2 

9 

5 

-7 


17. How many rows of A contain a pivot position? Does the 
equation Ax = b have a solution for each b in R 4 ? 

18. Can every vector in R 4 be written as a linear combination of 
the columns of the matrix B above? Do the columns of B 
span R 3 ? 


19. Can each vector in R 4 be written as a linear combination of 
the columns of the matrix A above? Do the columns of A 
span R 4 ? 


20. Do the columns of B span M 4 ? Does the equation Bx = y 
have a solution for each y in R 4 ? 



1 " 


0 " 


1" 

21. Let vi = 

0 

-1 

,V2 = 

-1 

0 

,V3 = 

0 

0 


0 


1 


-1 


{vi ， V 2 , V 3 } span R 4 ? Why or why not? 



0" 


0" 


4" 

22. Let Vi = 

0 

,v 2 = 

-3 

,v 3 = 

-2 


-3 


9 


—6 


{vi, V 2 , V 3 } span R 3 ? Why or why not? 


In Exercises 23 and 24, mark each statement True or False. Justify 
each answer. 


23. a. The equation Ax = b is referred to as a vector equation. 

b. A vector b is a linear combination of the columns of a 
matrix A if and only if the equation Ax = b has at least 
one solution. 

c. The equation Ax = b is consistent if the augmented ma¬ 
trix [ A b ] has a pivot position in every row. 

d. The first entry in the product Ax is a sum of products. 

e. If the columns of m m x n matrix A span M. m , then the 
equation Ax = b is consistent for each b in R m . 

f. If A is an tn x n matrix and if the equation Ax = b is 
inconsistent for some b in E m , then A cannot have a pivot 
position in every row. 


24. a. Every matrix equation Ax = b corresponds to a vector 
equation with the same solution set. 

b. If the equation Ax = b is consistent, then b is in the set 
spanned by the columns of A. 

c. Any linear combination of vectors can always be written 
in the form Ax for a suitable matrix A and vector x. 


d. If the coefficient matrix A has a pivot position in every 
row, then the equation Ax = b is inconsistent. 

e. The solution set of a linear system whose augmented 
matrix is [ ai a 2 sts b ] is the same as the solution set 
of ^4x = b, if ^4 = [ ai a 2 a 3 ]. 

f. If A is anm x n matrix whose columns do not span , 
then the equation Ax = b is consistent for every b in R m . 


25. 



4 

-3 

1 一 

"-3" 


"-7" 

Note that 

5 

-2 

5 

-1 

= 

-3 


—6 

2 

-3 

2 


10 


Use this 


fact (and no row operations) to find scalars ci, C 2 , C 3 such 



-7 


4 


-3 


1 

that 

-3 

=Cl 

5 

+ Cl 

-2 

+ 〔3 

5 


10 


—6 


2 


-3 


26. Let u = 

"7" 

2 

,v = 

"3" 

1 

,and w = 

"5" 

1 


5 


3 


1 


shown that 2u — 3v — w = 0. Use this fact (and no row 
operations) to find X\ and X 2 that satisfy the equation 


"7 3" 

2 1 

■ 


"5" 

1 

5 3 

_x 2 _ 


1 


27. Rewrite the (numerical) matrix equation below in symbolic 
form as a vector equation, using symbols Vi ， V 2 ,... for the 
vectors and ci, C 2 ,... for scalars. Define what each symbol 
represents, using the data given in the matrix equation. 

"-3 5 -4 9 7 

5 8 1-2-4 

28. Let q l5 q 2 , q 3 , and y represent vectors in R 5 , and let xi, X 2 , 
and X 3 denote scalars. Write the following vector equation as 
a matrix equation. Identify any symbols you choose to use. 



义 iqi + x 2 q 2 + x 3 q 3 = v 

29. Construct a 3 x 3 matrix, not in echelon form, whose 
columns span R 3 . Show that the matrix you construct has 
the desired property. 


30. Construct a 3 x 3 matrix, not in echelon form, whose 
columns do not span R 3 . Show that the matrix you construct 
has the desired property. 


31. Let ^4 be a 3 x 2 matrix. Explain why the equation Ax = b 
cannot be consistent for all b in R 3 . Generalize your ar¬ 
gument to the case of an arbitrary A with more rows than 
columns. 

























































42 CHAPTER 1 Linear Equations in Linear Algebra 


32. Could a set of three vectors in R 4 span all of R 4 ? Explain. 
What about n vectors in R m when n is less than ml 

33. Suppose ^4 is a 4 x 3 matrix and b is a vector in R 4 with 
the property that Ax = b has a unique solution. What can 
you say about the reduced echelon form of A1 Justify your 
answer. 

34. Let ^4 be a 3 x 4 matrix, let Vi and \2 be vectors in R 3 , and 
let w = Vi + V 2 - Suppose Vi = Aui and \2 = Au 2 for some 
vectors Ui and U 2 in M 4 . What fact allows you to conclude 
that the system Ax = w is consistent? (Note: Ui and U 2 
denote vectors, not scalar entries in vectors.) 

35. Let yl be a 5 x 3 matrix, let y be a vector in R 3 , and let z be 
a vector in R 5 . Suppose Ay = z. What fact allows you to 
conclude that the system Ax = 5z is consistent? 

36. Suppose ^4 is a 4 x 4 matrix and b is a vector in R 4 with the 
property that Ax = b has a unique solution. Explain why the 
columns of A must span R 4 . 

[M] In Exercises 37-40, determine if the columns of the matrix 

span R 4 . 



7 

2 

-5 

8 " 


"4 

-5 

-1 

8 " 

37. 

-5 

-3 

4 

-9 

38. 

3 

-7 

-4 

2 

6 

10 

-2 

7 

5 

—6 

-1 

4 


-7 

9 

2 

15 


9 

1 

10 

7 


39. 


40. 


10 

-7 

1 

4 

6 

-8 

4 

-6 

-10 

-3 

-7 

11 

-5 

-1 

-8 

3 

-1 

10 

12 

12 

5 

11 

—6 

-7 

12 " 

-7 

-3 

-4 

6 

-9 

11 

5 

6 

-9 

-3 

-3 

4 

-7 

2 

7 


41. [M] Find a column of the matrix in Exercise 39 that can be 
deleted and yet have the remaining matrix columns still span 
R 4 . 


42. [M] Find a column of the matrix in Exercise 40 that can be 
deleted and yet have the remaining matrix columns still span 
R 4 . Can you delete more than one column? 


SG Mastering Linear Algebra Concepts: Span 1-18 WEB 


SOLUTIONS TO PRACTICE PROBLEMS 


1. The matrix equation 


_ 1 

5 

-2 

0 " 


3 " 

-2 

0 

-4 


"- 7 " 

-3 

1 

9 

-5 


= 

9 

4 

-8 

-1 

7 



0 


is equivalent to the vector equation 


1 


5 


-2 


0 


-7 

-3 

4 

-2 

1 

-8 

+ 0 

9 

-1 

-4 

-5 

7 

二 

9 

0 


which expresses b as a linear combination of the columns of A. 


2. 


u + v 


^(u + y) 


+ A\ 


n 


-3 


2 5 " 

_ r 


"2 + 20 ' 


' 22 " 

3 1 _ 

4 


_ 3 + 4 _ 


7 _ 


2 5 " 

4 _ 


"2 5 " 

"- 3 " 

3 1 

-1 

+ 

3 1 

5 


19 

-4 


22 
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1.5 SOLUTION SETS OF LINEAR SYSTEMS 


Solution sets of linear systems are important objects of study in linear algebra. They 
will appear later in several different contexts. This section uses vector notation to give 
explicit and geometric descriptions of such solution sets. 


Homogeneous Linear Systems 

A system of linear equations is said to be homogeneous if it can be written in the 
form Ax = 0, where A is an m x n matrix and 0 is the zero vector in W n . Such a 
system Ax = 0 always has at least one solution, namely, x = 0 (the zero vector in R w ). 
This zero solution is usually called the trivial solution. For a given equation Ax = 0, 
the important question is whether there exists a nontrivial solution, that is, a nonzero 
vector x that satisfies Ax = 0. The Existence and Uniqueness Theorem in Section 1.2 
(Theorem 2) leads immediately to the following fact. 


The homogeneous equation Ax = 0 has a nontrivial solution if and only if the 
equation has at least one free variable. 


EXAMPLE 1 Determine if the following homogeneous system has a nontrivial 
solution. Then describe the solution set. 

3x\ + 5x2 — 4^3 = 0 
— 3x\ — 2.X2 H - 4^3 = 0 
6 x\ -\- X 2 — 8^3 = 0 



SOLUTION Let A be the matrix of coefficients of the system and row reduce the 
augmented matrix [ A 0 ] to echelon form: 


3 

5 

-4 

0 " 


"3 

5 

-4 

0 " 


"3 

5 

-4 

0 " 

-3 

-2 

4 

0 

〜 

0 

3 

0 

0 

〜 

0 

3 

0 

0 

6 

1 

-8 

0 


0 

-9 

0 

0 


0 

0 

0 

0 


Since X 3 is a free variable, Ax = 0 has nontrivial solutions (one for each choice of X 3 ). 
To describe the solution set, continue the row reduction of [ A 0 ] to reduced echelon 


form: 


1 0 0 
0 10 0 
0 0 0 0 


X\ _ 夸 X 3 = 0 

X 2 = 0 

0 = 0 


Solve for the basic variables X\ and X 2 and obtain x\ = |^ 3 , X 2 = 0, with X 3 free. As a 
vector, the general solution of Ax = 0 has the form 


xi 


r 4 "1 

3-^3 


~ 4 ~ 

3 


" 4 ~ 

3 

X2 

= 

0 

= ^3 

0 

= X3V, where v = 

0 

_^3_ 


_ -^3 _ 


_ 1 _ 


_ 1 _ 


Here X 3 is factored out of the expression for the general solution vector. This shows that 
every solution of Ax = 0 in this case is a scalar multiple of v. The trivial solution is 
obtained by choosing X 3 = 0. Geometrically, the solution set is a line through 0 in R 3 . 
See Fig. 1. ■ 


FIGURE 1 
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Notice that a nontrivial solution x can have some zero entries so long as not all of 
its entries are zero. 



EXAMPLE 2 A single linear equation can be treated as a very simple system of 
equations. Describe all solutions of the homogeneous “system” 

10xi — 3x2 — 2x3 = 0 (1) 


SOLUTION There is no need for matrix notation. Solve for the basic variable X\ in 
terms of the free variables. The general solution is X\ = 3x2 + . 2 ^ 3 , with and X3 
free. As a vector, the general solution is 



~ M " 


. 3%2 + . 2^3 


. 3^2 


• 2 x 3 

X = 


= 


= 


+ 

0 


_ X3 _ 


x 3 


0 


x 3 



".3" 


". 2 " 



又 2 

1 

0 

+ ^3 

0 

1 

(with X 2 , X 3 free) 

⑵ 


u v 


This calculation shows that every solution of (1) is a linear combination of the vectors 
u and v, shown in (2). That is, the solution set is Span {u, y}. Since neither u nor v is a 
scalar multiple of the other, the solution set is a plane through the origin. See Fig. 2. ■ 


Examples 1 and 2, along with the exercises, illustrate the fact that the solu¬ 
tion set of a homogeneous equation Ax = 0 can always be expressed explicitly as 
Span {vi,..., y^} for suitable vectors Vi,..., \ p . If the only solution is the zero vector, 
then the solution set is Span{0}. If the equation ^4x = 0 has only one free variable, 
the solution set is a line through the origin, as in Fig. 1. A plane through the origin, 
as in Fig. 2, provides a good mental image for the solution set of Ax = 0 when there 
are two or more free variables. Note, however, that a similar figure can be used to 
visualize Span {u, v} even when u and v do not arise as solutions of Ax = 0. See Fig. 11 
in Section 1.3. 


Parametric Vector Form 

The original equation (1) for the plane in Example 2 is an implicit description of the 
plane. Solving this equation amounts to finding an explicit description of the plane as 
the set spanned by u and y. Equation (2) is called a parametric vector equation of the 
plane. Sometimes such an equation is written as 

x = ( 夕 ，， in R) 

to emphasize that the parameters vary over all real numbers. In Example 1， the equation 
x = X 3 V (with X 3 free), oxx = t\ (with t in R), is a parametric vector equation of a line. 
Whenever a solution set is described explicitly with vectors as in Examples 1 and 2, we 
say that the solution is in parametric vector form. 


Solutions of Nonhomogeneous Systems 

When a nonhomogeneous linear system has many solutions, the general solution can be 
written in parametric vector form as one vector plus an arbitrary linear combination of 
vectors that satisfy the corresponding homogeneous system. 
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EXAMPLE 3 Describe all solutions of ^4x = b, where 



3 5 -4" 


7" 

A = 

-3 -2 4 

6 1 -8 

and b = 

-1 

-4 


SOLUTION Here A is the matrix of coefficients from Example 1. Row operations on 
[A b] produce 


3 

5 

-4 

7" 


"1 

0 

4 

— 3 

-1 " 

-3 

-2 

4 

-1 

〜 

0 

1 

0 

2 

6 

1 

-8 

-4 


0 

0 

0 

0 


x\ 






0 


-1 

2 

0 


Thus X\ = —1 + 1 ^ 3 , X 2 
Ax = b has the form 


2, and X 3 is free. As a vector, the general solution of 


Xi 


_1 + 


-1 




-1 


■ 4 

3 

X 2 

= 

2 

= 

2 

+ 

0 

= 

2 

+ X 3 

0 

X 3 


又 3 


0 


x 3 


0 


1 


个 t 

P v 


The equation x = p + X 3 V ， or, writing ^ as a general parameter, 


/v + P 

/ / 
v 


FIGURE 3 

Adding p to y translates y to y + p. 


x = p + (t in R) (3) 

describes the solution set of Ax = b in parametric vector form. Recall from Example 1 
that the solution set of Ax = 0 has the parametric vector equation 

x = t\ {t in R) (4) 

[with the same y that appears in (3)]. Thus the solutions of Ax = b are obtained by 
adding the vector p to the solutions of Ax = 0. The vector p itself is just one particular 
solution of Ax = b [corresponding to ? = 0 in (3)]. ■ 



FIGURE 4 

Translated line. 


To describe the solution set of Ax = b geometrically, we can think of vector 
addition as a translation. Given y and p in R 2 or R 3 , the effect of adding p to y is 
to move v in a direction parallel to the line through p and 0. We say that v is translated 
by p to v + p. See Fig. 3. If each point on a line L in R 2 or R 3 is translated by a vector 
p, the result is a line parallel to L. See Fig. 4. 

Suppose L is the line through 0 and v, described by equation (4). Adding p to each 
point on L produces the translated line described by equation (3). Note that p is on the 
line in equation (3). We call (3) the equation of the line through p parallel to v. Thus 
the solution set of Ax = b is a line through p parallel to the solution set of Ax = 0. 
Figure 5 illustrates this case. 



Ax = 0. 


The relation between the solution sets of Ax = b and Ax = 0 shown in Fig. 5 
generalizes to any consistent equation Ax = b, although the solution set will be larger 
than a line when there are several free variables. The following theorem gives the precise 
statement. See Exercise 25 for a proof. 
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THEOREM 6 


Suppose the equation Ax = b is consistent for some given b, and let p be a 
solution. Then the solution set of Ax = b is the set of all vectors of the form 
w = p + y；!, where \h is any solution of the homogeneous equation Ax = 0. 


Theorem 6 says that if ^4x = b has a solution, then the solution set is obtained by 
translating the solution set of Ax = 0, using any particular solution p of Ax = b for 
the translation. Figure 6 illustrates the case in which there are two free variables. Even 
when n > 3, our mental image of the solution set of a consistent system Ax = b (with 
b 7 ^ 0) is either a single nonzero point or a line or plane not passing through the origin. 



Warning: Theorem 6 and Fig. 6 apply only to an equation Ax = b that has at least 
one nonzero solution p. When Ax = b has no solution, the solution set is empty. 

The following algorithm outlines the calculations shown in Examples 1, 2, and 3. 


WRITING A SOLUTION SET (OF A CONSISTENT SYSTEM) IN PARAMETRIC 

VECTOR FORM 

1. Row reduce the augmented matrix to reduced echelon form. 

2. Express each basic variable in terms of any free variables appearing in an 
equation. 

3. Write a typical solution x as a vector whose entries depend on the free 
variables, if any. 

4. Decompose x into a linear combination of vectors (with numeric entries) using 
the free variables as parameters. 


PRACTICE PROBLEMS 

1. Each of the following equations determines a plane in R 3 . Do the two planes 
intersect? If so, describe their intersection. 

X\ + 4X2 — 5^3 = 0 

2x\ — X 2 + 8 x 3 = 9 

2. Write the general solution of 10xi — 3 x 2 — 2^3 = 7 in parametric vector form, and 
relate the solution set to the one found in Example 2. 
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In Exercises 1-4, determine if the system has a nontrivial solution. 
Try to use as few row operations as possible. 


2xi — 5x2 + 8^3 
-2xi ■— 1X2 "f - X3 


4xi + 2x2 + 7^3 = 0 


2. x\ — 2x2 + 3x3 — 0 
—2xi — 3x2 — 4 x 3 = 0 
2xi — 4x2 + 9x3 = 0 


3. _ 3 义 1 + 4x2 _ 8-^3 = 0 4. 5 又 1 _ 3 而 + 2 又 3 = 0 

_ 2 义 1 + 5^2 ~h 4^3 —— 0 _ 3x\ — 4 义 2 + 2 又 3 = 0 

In Exercises 5 and 6 , follow the method of Examples 1 and 2 
to write the solution set of the given homogeneous system in 
parametric vector form. 


5. 2 又 1 + 2 又 2 + 4x3 = 0 

—4xi — 4x2 — 8 x 3 = 0 
— 3X2 — 3X3 = 0 


6 . 


X\ -f- 2^2 — 3^3 — 0 

2 义 1 + X2 — 3^3 — 0 


-Ixi + X 2 


0 


In Exercises 7-12, describe all solutions of Ax = 0 in parametric 
vector form, where A is row equivalent to the given matrix. 


7. 


9. 


11 . 


12 . 


-2 3-6 5 


0 0 


4 -6 


13. Suppose the solution set of a certain system of linear equa¬ 
tions can be described asxi = 5 + 4x3, X 2 = —2 — lx^, with 
X 3 free. Use vectors to describe this set as a line in M. 3 . 

14. Suppose the solution set of a certain system of linear 
equations can be described as X\ = 5 x 4 , X 2 = 3 — 2 叉 4, 
X 3 = 2 + 5 x 4 , with X 4 free. Use vectors to describe this set 
as a “line” in R 4 . 

15. Describe and compare the solution sets of X\ + 5 x 2 — 
3x 3 = 0 and X\ + 5x 2 — 3x 3 = — 2 . 

16. Describe and compare the solution sets of Xi — + 

3 x 3 — 0 and x\ — 2 x 2 + 3^3 = 4. 

17. Follow the method of Example 3 to describe the solutions of 
the following system in parametric vector form. Also, give 
a geometric description of the solution set and compare it to 
that in Exercise 5. 

2尤1 + 2 x 2 + 4义3 = 8 

— 4 xi — 4 x 2 — 8x3 = —16 
一 3x7 _ 3x3 = 12 


18. As in Exercise 17, describe the solutions of the following 
system in parametric vector form, and provide a geometric 
comparison with the solution set in Exercise 6 . 

Xi -|- 2^2 一 3^3 =： 5 
2,Xj X 2 — 8 x 3 = 13 

_ Xi + X 2 = _ 8 

In Exercises 19 and 20, find the parametric equation of the line 
through a parallel to b. 


19. a 


,b 


-5 


20. a 


-2 


,b 


In Exercises 21 and 22, find a parametric equation of the line M 
through p and q. [Hint: M is parallel to the vector q — p. See the 
figure below.] 


21. p 


,q 


4 


22 . p : 


,q 


尤 2 





In Exercises 23 and 24, mark each statement True or False. Justify 
each answer. 

23. a. A homogeneous equation is always consistent. 

b. The equation Zx = 0 gives an explicit description of its 
solution set. 

c. The homogeneous equation Ax = 0 has the trivial so¬ 
lution if and only if the equation has at least one free 
variable. 

d. The equation x = p + /y describes a line through y par¬ 
allel to p. 

e. The solution set of Ax = b is the set of all vectors of 
the form w = p + y/j, where v；, is any solution of the 
equation Ax = 0. 

24. a. A homogeneous system of equations can be inconsistent. 

b. If x is a nontrivial solution of Ax = 0, then every entry in 
x is nonzero. 

c. The effect of adding p to a vector is to move the vector in 
a direction parallel to p. 

d. The equation Ax = b is homogeneous if the zero vector 
is a solution. 


1 

3 -3 

7 

8 . 

1 -3 -8 

5 

0 

1 -4 

5 

0 1 2 

-4 


1.5 EXERCISES 
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48 CHAPTER 1 Linear Equations in Linear Algebra 


e. If Ax = b is consistent, then the solution set of Ax = b 
is obtained by translating the solution set of Ax = 0. 

25. Prove Theorem 6: 

a. Suppose p is a solution of Ax = b, so that Ap = b. Let 
\h be any solution of the homogeneous equation Ax = 0, 
and let w = p + \h. Show that w is a solution of Ax = b. 

b. Let w be any solution of Ax = b, and define \h = w — p. 
Show that v/j is a solution of Ax = 0. This shows that 
every solution of Ax = b has the form w = p + V/j, with 
pa particular solution of Ax = b and \h a solution of 
Ax = 0. 

26. Suppose A is the 3x3 zero matrix (with all zero entries). 
Describe the solution set of the equation Ax = 0. 

27. Suppose Ax = b has a solution. Explain why the solution is 
unique precisely when Ax = 0 has only the trivial solution. 


In Exercises 28-31, (a) does the equation Ax = 0 have a nontriv¬ 
ial solution and (b) does the equation Ax = b have at least one 
solution for every possible b? 

28. yl is a 3 x 3 matrix with three pivot positions. 

29. yl is a 4 x 4 matrix with three pivot positions. 

30. yl is a 2 x 5 matrix with two pivot positions. 

31. ^4 is a 3 x 2 matrix with two pivot positions. 

32. If b 0, can the solution set of Ax = b be a plane through 
the origin? Explain. 


34. Construct a 3 x 3 nonzero matrix A such that the vector 
2 " 

—1 is a solution of Ax = 0. 


35. Given A 


,find one nontrivial solution of 


-1 -3 
7 21 

-2 -6 

i4x = 0 by inspection. [Hint: Think of the equation Ax = 0 
written as a vector equation.] 

"3-2' 

36. Given A = —6 4 , find one nontrivial solution of 

12 -8 
i4x = 0 by inspection. 


37. Construct a 2 x 2 matrix A such that the solution set of the 
equation Ax = 0 is the line in R 2 through (4,1) and the 
origin. Then, find a vector b in M 2 such that the solution 
set of Ax = b is not a line in R 2 parallel to the solution set 
of Ax = 0. Why does this not contradict Theorem 6 ? 


38. Let Abe m m x n matrix and let w be a vector in R” that 
satisfies the equation Ax = 0. Show that for any scalar c ， 
the vector cw also satisfies Ax = 0. [That is, show that 
yl(cw) = 0.] 


39. Let A be an m x n matrix, and let y and w be vectors in 
with the property that A\ = 0 and Aw = 0. Explain 
why ^4(y + w) must be the zero vector. Then explain why 
A{c\ + dw) = 0 for each pair of scalars c and d. 


33. Construct a 3 x 3 nonzero matrix A such that the vector 1 

_ 1 

is a solution of Ax = 0. 


40. Suppose i4 is a 3 x 3 matrix and b is a vector in R 3 such that 
the equation Ax = b does not have a solution. Does there 
exist a vector y in R 3 such that the equation Ax = y has a 
unique solution? Discuss. 


SOLUTIONS TO PRACTICE PROBLEMS 


1. Row reduce the augmented matrix: 


"1 4 

-5 

O' 


"1 

4 

-5 

O' 


"1 

0 3 4" 

2-1 

8 

9 


0 

-9 

18 

9_ 


0 

1 - 2-1 


x\ + 3^3 = 4 

X2 _ 2^3 = — 1 


Thus x\ = 4 — 3 x 3 , ^2 = — 1 + 2 % 3 , with X 3 free. The general solution in paramet¬ 
ric vector form is 


^ 1 " 


_ 4-3x3 


4" 


'-3" 


= 

— 1 + 2^3 

= 

-1 

+ X 3 

2 

x 3 


x 3 


0 


1 


P v 


The intersection of the two planes is the line through p in the direction of v. 
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2. The augmented matrix [ 10 —3 —2 7 ] is row equivalent to [ 1 —.3 —.2 .7 ], 

and the general solution is X\ = .1 -\- 3 x 2 + .2^3, with X 2 and X 3 free. That is, 


X = 

'^1" 

_ 

•7 + . 3%2 + * 2^3 

_ 

.7 

0 

+ X2 

.3 

1 

+ X3 

.2 

0 


_ X3 _ 


X 3 


0 


0 


1 


= p + X 2 U + X 3 V 


The solution set of the nonhomogeneous equation Ax = b is the translated plane 
p + Span {u, v}, which passes through p and is parallel to the solution set of the 
homogeneous equation in Example 2. 

1.6 APPLICATIONS OF LINEAR SYSTEMS 


You might expect that a real-life problem involving linear algebra would have only one 
solution, or perhaps no solution. The purpose of this section is to show how linear 
systems with many solutions can arise naturally. The applications here come from 
economics, chemistry, and network flow. 

A Homogeneous System in Economics 

The system of 500 equations in 500 variables, mentioned in this chapter’s introduction, 
is now known as a Leontief “input-output” (or “production” ） model. 1 Section 2.6 will 
examine this model in more detail, when more theory and better notation are available. 
For now, we look at a simpler “exchange model,” also due to Leontief. 

Suppose a nation’s economy is divided into many sectors, such as various manu¬ 
facturing, communication, entertainment, and service industries. Suppose that for each 
sector we know its total output for one year and we know exactly how this output is 
divided or “exchanged” among the other sectors of the economy. Let the total dollar 
value of a sector’s output be called the price of that output. Leontief proved the 
following result. 

There exist equilibrium prices that can be assigned to the total outputs of the 
various sectors in such a way that the income of each sector exactly balances its 
expenses. 

The following example shows how to find the equilibrium prices. 

EXAMPLE 1 Suppose an economy consists of the Coal, Electric (power), and Steel 
sectors, and the output of each sector is distributed among the various sectors as shown 
in Table 1 on page 50, where the entries in a column represent the fractional parts of a 
sector’s total output. 

The second column of Table 1, for instance, says that the total output of the Electric 
sector is divided as follows: 40% to Coal, 50% to Steel, and the remaining 10% to 
Electric. (Electric treats this 10% as an expense it incurs in order to operate its business.) 
Since all output must be taken into account, the decimal fractions in each column must 
sum to 1. 


See Wassily W. Leontief, “Input-Output Economics,” Scientific American, October 1951, pp. 15-21. 














50 CHAPTER 1 Linear Equations in Linear Algebra 



Denote the prices (i.e.，dollar values) of the total annual outputs of the Coal, 
Electric, and Steel sectors by pc, Pe, and ps, respectively. If possible, find equilibrium 
prices that make each sector’s income match its expenditures. 



TABLE 1 A Simple Economy 


Distribution of Output from: 

Coal 

Electric 

Steel 

Purchased by: 

.0 

.4 

.6 

Coal 

.6 

.1 

.2 

Electric 

.4 

.5 

.2 

Steel 


.4 

SOLUTION A sector looks down a column to see where its output goes, and it looks 
across a row to see what it needs as inputs. For instance, the first row of Table 1 
says that Coal receives (and pays for) 40% of the Electric output and 60% of the Steel 
output. Since the respective values of the total outputs are and ps, Coal must spend 
ApE dollars for its share of Electric’s output and .6ps for its share of Steel’s output. 
Thus Coal’s total expenses are Ape + To make Coal’s income, pc, equal to its 
expenses, we want 

Pc = 々 e + .6ps (1) 


The second row of the exchange table shows that the Electric sector spends .6pc 
for coal, .lp£ for electricity, and .2p$ for steel. Hence the income/expense requirement 
for Electric is 

Pe = .6p c + . IPe + .2p s (2) 


Finally, the third row of the exchange table leads to the final requirement: 


Ps = ^Pc + ^Pe + .2p s 


(3) 


To solve the system of equations (1 )， （2)，and (3)，move all the unknowns to the left 
sides of the equations and combine like terms. [For instance, on the left side of (2)， 
write p E - - lp E as .9p E .] 


Pc _ ^Pe - -6/>s = 0 
—.6pc + .9/>e — .2 户 s = 0 

-Ap c - .5p E + .8^ s = 0 

Row reduction is next. For simplicity here, decimals are rounded to two places. 


一 1 

-.4 

-.6 

0 " 


"1 

-.4 

-.6 

0 " 


一 1 

-.4 

-.6 

0 " 

—.6 

.9 

-.2 

0 

〜 

0 

.66 

-.56 

0 

〜 

0 

•66 

-.56 

0 

-.4 

— .5 

.8 

0 


0 

-.66 

•56 

0 


0 

0 

0 

0 


" 1 

-.4 

-.6 

0 " 


"1 

0 

-.94 

0 " 

0 

1 - 

-.85 

0 

〜 

0 

1 

一 .85 

0 

0 

0 

0 

0 


0 

0 

0 

0 
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The general solution is pc = -94ps, Pe = .85/7s，and ps is free. The equilibrium price 
vector for the economy has the form 


Pc 


•94ps 


".94" 

Pe 

= 

.85/7 S 

=Ps 

.85 

_Ps_ 


_ Ps _ 


1 


Any (nonnegative) choice for ps results in a choice of equilibrium prices. For instance, 
if we take ps to be 100 (or $100 million), then pc = 94 and = 85. The incomes and 
expenditures of each sector will be equal if the output of Coal is priced at $94 million, 
that of Electric at $85 million, and that of Steel at $100 million. ■ 

Balancing Chemical Equations 

Chemical equations describe the quantities of substances consumed and produced 
by chemical reactions. For instance, when propane gas burns, the propane (C 3 H 8 ) 
combines with oxygen (O 2 ) to form carbon dioxide (CO 2 ) and water (H 2 O), according 
to an equation of the form 

(xi)C 3 H 8 + (x 2 )0 2 ^ (x 3 )C0 2 + (x 4 )H 2 0 ⑷ 

To “balance” this equation, a chemist must find whole numbers X\,... ,X 4 such that the 
total numbers of carbon (C), hydrogen (H), and oxygen (O) atoms on the left match the 
corresponding numbers of atoms on the right (because atoms are neither destroyed nor 
created in the reaction). 

A systematic method for balancing chemical equations is to set up a vector equation 
that describes the numbers of atoms of each type present in a reaction. Since equation 
(4) involves three types of atoms (carbon, hydrogen, and oxygen), construct a vector in 
R 3 for each reactant and product in (4) that lists the numbers of “atoms per molecule，” 
as follows: 



3 


0 


1 


0 

Carbon 

C 3 Hg ： 

8 

， O 2 : 

0 

， co 2 ： 

0 

， h 2 o ： 

2 

— Hydrogen 


0 


2 


2 


1 

一 Oxygen 


To balance equation (4)，the coefficients X\,... ,X 4 must satisfy 



3 


0 


1 


0 


8 

+ X2 

0 

= ^3 

0 

+ X 4 

2 


0 


2 


2 


1 


To solve, move all the terms to the left (changing the signs in the third and fourth 
vectors): 



3 


0 


-1 


0 


0 

Xi 

8 

+ X.2 

0 

+ X 3 

0 

+ X 4 

-2 

= 

0 


0 


2 


-2 


-1 


0 


Row reduction of the augmented matrix for this equation leads to the general solution 

x\ = ^X 4 , X 2 = |^ 4 , X 3 = |^ 4 , with X 4 free 

Since the coefficients in a chemical equation must be integers, take X\ = 4, in which 
case X\ = 1, X 2 = 5, and X 3 = 3. The balanced equation is 

C 3 H 8 + 50 2 — 3C0 2 + 4H 2 0 

The equation would also be balanced if, for example, each coefficient were doubled. For 
most purposes, however, chemists prefer to use a balanced equation whose coefficients 
are the smallest possible whole numbers. 
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Network Flow 


WEB 


30 



FIGURE 1 

A junction, or node. 


Systems of linear equations arise naturally when scientists, engineers, or economists 
study the flow of some quantity through a network. For instance, urban planners and 
traffic engineers monitor the pattern of traffic flow in a grid of city streets. Electrical 
engineers calculate current flow through electrical circuits. And economists analyze 
the distribution of products from manufacturers to consumers through a network of 
wholesalers and retailers. For many networks, the systems of equations involve 
hundreds or even thousands of variables and equations. 

A network consists of a set of points called junctions, or nodes ，with lines or arcs 
called branches connecting some or all of the junctions. The direction of flow in each 
branch is indicated, and the flow amount (or rate) is either shown or is denoted by a 
variable. 

The basic assumption of network flow is that the total flow into the network equals 
the total flow out of the network and that the total flow into a junction equals the total 
flow out of the junction. For example, Fig. 1 shows 30 units flowing into a junction 
through one branch, with X\ and denoting the flows out of the junction through other 
branches. Since the flow is “conserved” at each junction, we must have x\ -\- X 2 = 30. 
In a similar fashion, the flow at each junction is described by a linear equation. The 
problem of network analysis is to determine the flow in each branch when partial 
information (such as the flow into and out of the network) is known. 


EXAMPLE 2 The network in Fig. 2 shows the traffic flow (in vehicles per hour) 
over several one-way streets in downtown Baltimore during a typical early afternoon. 
Determine the general flow pattern for the network. 


又 3 


100 


300 + 


Calvert St. 


Lombard St. 


300 - 


^2 

Pratt St. 


South St. 


又 4 


又 5 


A 


— <-400 


文 l 

500 


+ 600 


FIGURE 2 Baltimore streets. 


SOLUTION Write equations that describe the flow, and then find the general solution 
of the system. Label the street intersections (junctions) and the unknown flows in the 
branches, as shown in Fig. 2. At each intersection, set the flow in equal to the flow out. 


Intersection 

A 

B 

C 


Flow in Flow out 

300 + 500 = X\ + X 2 
X 2 X 4 = 300 + X 3 
100 + 400 = ^4 + Xs 
X\ -\- Xs = 600 


D 
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Also, the total flow into the network (500 + 300 + 100 + 400) equals the total flow 


out of the network (300 + X 3 + 600), which simplifies to X 3 = 400. Combine this 
equation with a rearrangement of the first four equations to obtain the following system 

of equations: 


X\ + x 2 

= 800 

x 2 - X 3 X 4 

= 300 

X 4 + X 5 

= 500 

Xl + x 5 

= 600 

x 3 

= 400 

Row reduction of the associated augmented matrix leads to 

Xi + X 5 = 

600 

X 2 ~ X 5 = 

200 

X 3 = 

400 

X 4 + X 5 = 

500 


The general flow pattern for the network is described by 



= 600- 

- X5 

X2 

= 200 + Xs 

X3 

= 400 


X4 

= 500- 

- X5 

Xs is free 



A negative flow in a network branch corresponds to flow in the direction opposite 
to that shown on the model. Since the streets in this problem are one-way, none of the 
variables here can be negative. This fact leads to certain limitations on the possible 
values of the variables. For instance, Xs < 500 because X 4 cannot be negative. Other 
constraints on the variables are considered in Practice Problem 2. ■ 


PRACTICE PROBLEMS 


1. Suppose an economy has three sectors: Agriculture, Mining, and Manufacturing. 
Agriculture sells 5% of its output to Mining and 30% to Manufacturing, and retains 
the rest. Mining sells 20% of its output to Agriculture and 70% to Manufacturing, 
and retains the rest. Manufacturing sells 20% of its output to Agriculture and 30% to 
Mining, and retains the rest. Determine the exchange table for this economy, where 
the columns describe how the output of each sector is exchanged among the three 
sectors. 

2. Consider the network flow studied in Example 2. Determine the possible range of 
values of x\ and X 2 . [Hint: The example showed that X 5 < 500. What does this 
imply about X\ and Also, use the fact that X 5 > 0.] 
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1.6 EXERCISES 

1. Suppose an economy has only two sectors: Goods and Ser¬ 
vices. Each year, Goods sells 80% of its output to Services 
and keeps the rest, while Services sells 70% of its output to 
Goods and retains the rest. Find equilibrium prices for the 
annual outputs of the Goods and Services sectors that make 
each sector’s income match its expenditures. 




.3 


2. Find another set of equilibrium prices for the economy in 
Example 1. Suppose the same economy used Japanese 
yen instead of dollars to measure the values of the various 
sectors’ outputs. Would this change the problem in any way? 
Discuss. 

3. Consider an economy with three sectors: Fuels and Power, 
Manufacturing, and Services. Fuels and Power sells 80% 
of its output to Manufacturing, 10% to Services, and retains 
the rest. Manufacturing sells 10% of its output to Fuels and 
Power, 80% to Services, and retains the rest. Services sells 
20% to Fuels and Power, 40% to Manufacturing, and retains 
the rest. 

a. Construct the exchange table for this economy. 

b. Develop a system of equations that leads to prices at 
which each sector’s income matches its expenses. Then 
write the augmented matrix that can be row reduced to 
find these prices. 

c. [M] Find a set of equilibrium prices when the price for 
the Services output is 100 units. 

4. Suppose an economy has four sectors: Mining, Lumber, 
Energy, and Transportation. Mining sells 10% of its output 
to Lumber, 60% to Energy, and retains the rest. Lumber 
sells 15% of its output to Mining, 50% to Energy, 20% to 
Transportation, and retains the rest. Energy sells 20% of its 
output to Mining, 15% to Lumber, 20% to Transportation, 
and retains the rest. Transportation sells 20% of its output to 
Mining, 10% to Lumber, 50% to Energy, and retains the rest. 

a. Construct the exchange table for this economy. 

b. [M] Find a set of equilibrium prices for the economy. 

5. An economy has four sectors: Agriculture, Manufacturing, 
Services, and Transportation. Agriculture sells 20% of its 
output to Manufacturing, 30% to Services, 30% to Trans¬ 
portation, and retains the rest. Manufacturing sells 35% of its 
output to Agriculture, 35% to Services, 20% to Transporta¬ 
tion, and retains the rest. Services sells 10% of its output to 
Agriculture, 20% to Manufacturing, 20% to Transportation, 


and retains the rest. Transportation sells 20% of its output 
to Agriculture, 30% to Manufacturing, 20% to Services, and 
retains the rest. 

a. Construct the exchange table for this economy. 

b. [M] Find a set of equilibrium prices for the economy if 
the value of Transportation is $10.00 per unit. 

c. The Services sector launches a successful “eat farm fresh” 
campaign, and increases its share of the output from the 
Agricultural sector to 40%, whereas the share of Agri¬ 
cultural production going to Manufacturing falls to 10%. 
Construct the exchange table for this new economy. 

d. [M] Find a set of equilibrium prices for this new economy 
if the value of Transportation is still $10.00 per unit. 
What effect has the “eat farm fresh” campaign had on the 
equilibrium prices for the sectors in this economy? 

Balance the chemical equations in Exercises 6-11 using the vector 
equation approach discussed in this section. 

6 . Aluminum oxide and carbon react to create elemental alu¬ 
minum and carbon dioxide: 

A1 2 0 3 + C 4 ai + co 2 

[For each compound, construct a vector that lists the numbers 
of atoms of aluminum, oxygen, and carbon.] 

7. Alka-Seltzer contains sodium bicarbonate (NaHCC> 3 ) and 
citric acid (H 3 C 6 H 5 O 7 ). When a tablet is dissolved in water, 
the following reaction produces sodium citrate, water, and 
carbon dioxide (gas): 

NaHCOs + H 3 C 6 H 5 0 7 — Na 3 C 6 H 5 0 7 + H 2 0 + C0 2 

8 . Limestone, CaCC> 3 ， neutralizes the acid, H 3 O, in acid rain by 
the following unbalanced equation: 

H 3 O + CaC0 3 — H 2 0 + Ca + C0 2 

9. Boron sulfide reacts violently with water to form boric acid 
and hydrogen sulfide gas (the smell of rotten eggs). The 
unbalanced equation is 

B 2 S 3 + H 2 0 — H 3 BO 3 + h 2 s 

10. [M] If possible, use exact arithmetic or a rational format for 
calculations in balancing the following chemical reaction: 

PbN 6 + CrMr^Og — Pb 3 C >4 + Cr203 + Mn02 + NO 

11. [M] The chemical reaction below can be used in some in¬ 
dustrial processes, such as the production of arsene (ASH 3 ). 
Use exact arithmetic or a rational format for calculations to 
balance this equation. 

MnS + As2Crio035 + H 2 SO 4 

-> HMn0 4 + AsH 3 + CrS 3 0 i 2 + H 2 0 
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12. Find the general flow pattern of the network shown in the 
figure. Assuming that the flows are all nonnegative, what is 
the smallest possible value for x^l 

B 



13. a. Find the general flow pattern of the network shown in the 
figure. 

b. Assuming that the flow must be in the directions indi¬ 
cated, find the minimum flows in the branches denoted 
by x 2 , x 3 , x 4 , and x 5 . 


30 40 



14. a. Find the general traffic pattern of the freeway network 


shown in the figure. (Flow rates are in cars/minute.) 

b. Describe the general traffic pattern when the road whose 
flow is X 5 is closed. 

c. When X 5 = 0, what is the minimum value of X 4 ? 


A ^1 B 



15. Intersections in England are often constructed as one-way 
“roundabouts,” such as the one shown in the figure. Assume 
that traffic must travel in the directions shown. Find the 
general solution of the network flow. Find the smallest 
possible value for x^. 



SOLUTIONS TO PRACTICE PROBLEMS 


1. Write the percentages as decimals. Since all output must be taken into account, each 
column must sum to 1. This fact helps to fill in any missing entries. 


Distribution of Output from: 

Agriculture 

Mining 

Manufacturing 

Purchased by: 

.65 

•20 

.20 

Agriculture 

.05 

• 10 

•30 

Mining 

.30 

•70 

.50 

Manufacturing 


2. Since X 5 < 500, the equations D and A for x\ and X 2 imply that x\ > 100 
and X 2 < 700. The fact that X 5 > 0 implies that x\ < 600 and X 2 > 200. So, 
100 < Xi < 600, and 200 < X 2 < 700. 


1.7 LINEAR INDEPENDENCE 


The homogeneous equations in Section 1.5 can be studied from a different perspective 
by writing them as vector equations. In this way, the focus shifts from the unknown 
solutions of ^4x = 0 to the vectors that appear in the vector equations. 
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For instance, consider the equation 



1 


4 


2 


0 

又 1 

2 

+ 又 2 

5 

+ X3 

1 

= 

0 


3 


6 


0 


0 


This equation has a trivial solution, of course, where X\ = X 2 = = 0. As in 

Section 1.5, the main issue is whether the trivial solution is the only one. 


DEFINITION An indexed set of vectors {vi,... ,\ p } in R ,2 is said to be linearly independent 
if the vector equation 

^lVi + x 2 \2 H - 1- X p \ p = 0 

has only the trivial solution. The set {vi ，…， 〜} is said to be linearly dependent 
if there exist weights c\,..., c p , not all zero, such that 

C\\\ + C 2 \2 H - h C p \ p = 0 (2) 


Equation (2) is called a linear dependence relation among vi,...,when the 
weights are not all zero. An indexed set is linearly dependent if and only if it is not 
linearly independent. For brevity, we may say that vi, …， 〜are linearly dependent 
when we mean that {vi,..., v^} is a linearly dependent set. We use analogous 
terminology for linearly independent sets. 

" 2 " 

,and V 3 = 1 . 

_ 0 _ 

a. Determine if the set {vi, V 2 , V 3 } is linearly independent. 

b. If possible, find a linear dependence relation among vi, \ 2 , and V 3 . 

SOLUTION 

a. We must determine if there is a nontrivial solution of equation (1) above. Row oper¬ 
ations on the associated augmented matrix show that 


"1 

4 

2 

0 " 


"1 

4 

2 

0 " 

2 

5 

1 

0 

〜 

0 

-3 

-3 

0 

3 

6 

0 

0 


0 

0 

0 

0 


Clearly, X\ and are basic variables, and X 3 is free. Each nonzero value of X 3 
determines a nontrivial solution of (1). Hence Vi, V 2 , V 3 are linearly dependent (and 
not linearly independent). 

b. To find a linear dependence relation among vi, \ 2 , and V 3 , completely row reduce 
the augmented matrix and write the new system: 

'1 0-2 0 

0 110 
0 0 0 0 

Thus X\ = 2 x 3 , X2 = —X3, and X3 is free. Choose any nonzero value for X3 — say, 
X 3 = 5. Then X\ = 10 and X 2 = —5. Substitute these values into equation (1) and 
obtain 

10vi — 5 v2 + 5v3 = 0 

This is one (out of infinitely many) possible linear dependence relations among vi, 
\ 2 , and V 3 . ■ 


X\ — 2 x 3 = 0 

X2 + ^3 = 0 

0 = 0 



1 


4 

EXAMPLE 1 Letvi = 

2 

3 

,V 2 = 

5 

6 
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Linear Independence of Matrix Columns 

Suppose that we begin with a matrix A = [si\ ••- a„ ] instead of a set of vectors. The 
matrix equation Ax = 0 can be written as 

又 i»i + x 2 a 2 H - h x n a n = 0 

Each linear dependence relation among the columns of A corresponds to a nontrivial 
solution of Ax = 0. Thus we have the following important fact. 


The columns of a matrix A are linearly independent if and only if the equation 
^4x = 0 has only the trivial solution. (3) 


0 1 4 

EXAMPLE 2 Determine if the columns of the matrix > 1=1 2 —1 are 

_5 8 0_ 

linearly independent. 

SOLUTION To study Ax = 0, row reduce the augmented matrix: 


"0 

1 

4 

0 " 


"1 

2 

-1 

0 " 


"1 

2 

-1 

0 " 

1 

2 

-1 

0 

〜 

0 

1 

4 

0 

〜 

0 

1 

4 

0 

5 

8 

0 

0 


0 

-2 

5 

0 


0 

0 

13 

0 


At this point, it is clear that there are three basic variables and no free variables. So 
the equation ylx = 0 has only the trivial solution, and the columns of A are linearly 
independent. ■ 


Sets of One or Two Vectors 

A set containing only one vector—say, v—is linearly independent if and only if y is 
not the zero vector. This is because the vector equation x\\ = 0 has only the trivial 
solution when v # 0. The zero vector is linearly dependent because XiO = 0 has many 
nontrivial solutions. 

The next example will explain the nature of a linearly dependent set of two vectors. 

EXAMPLE 3 Determine if the following sets of vectors are linearly independent. 

"3 

a. Vi = ^ 

SOLUTION 

a. Notice that \2 is a multiple of Vi, namely, \2 = 2\\. Hence —2\\ + y 2 = 0, which 
shows that {vi, V 2 } is linearly dependent. 

b. The vectors Vi and \2 are certainly not multiples of one another. Could they be 
linearly dependent? Suppose c and d satisfy 

cvi + d\2 = 0 



6 


3 


6 

,V 2 = 

2 

b. Vi = 

2 

,V 2 = 

2 


If c 7 ^ 0, then we can solve for Vi in terms of V 2 , namely, Vi = (~d/c)\ 2 . This 
result is impossible because Vi is not a multiple of V 2 . So c must be zero. Similarly, 
d must also be zero. Thus {vi, V 2 } is a linearly independent set. ■ 
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x i 


(6,2) 


( 3 , 1 ) 


Linearly dependent 





Linearly independent 


The arguments in Example 3 show that you can always decide by inspection when a 
set of two vectors is linearly dependent. Row operations are unnecessary. Simply check 
whether at least one of the vectors is a scalar times the other. (The test applies only to 
sets of two vectors.) 


A set of two vectors {vi,V 2 } is linearly dependent if at least one of the vectors is 
a multiple of the other. The set is linearly independent if and only if neither of the 
vectors is a multiple of the other. 


In geometric terms, two vectors are linearly dependent if and only if they lie on the 
same line through the origin. Figure 1 shows the vectors from Example 3. 


Sets of Two or More Vectors 


The proof of the next theorem is similar to the solution of Example 3. Details are given 
at the end of this section. 


THEOREM 7 Characterization of Linearly Dependent Sets 

An indexed set S = {vi,..., y^} of two or more vectors is linearly dependent if 
and only if at least one of the vectors in ^ is a linear combination of the others. In 
fact, if S is linearly dependent and Vi ^ 0, then some v ； (with j > 1) is a linear 
combination of the preceding vectors, Vi,..., v 7 -i. 


Warning: Theorem 7 does not say that every vector in a linearly dependent set is a 
linear combination of the preceding vectors. A vector in a linearly dependent set may 
fail to be a linear combination of the other vectors. See Practice Problem 3. 


EXAMPLE 4 Letu = 

"3" 

1 

and v = 

"1" 

6 


0 


0 


Describe the set spanned by u and v, 


and explain why a vector w is in Span {u, y} if and only if {u, v, w} is linearly dependent. 


SOLUTION The vectors u and y are linearly independent because neither vector is a 
multiple of the other, and so they span a plane in R 3 . (See Section 1.3.) In fact, 
Span {u, v} is the x 1 ^ 2 -plane (with = 0). If w is a linear combination of u and v, 
then {u, v, w} is linearly dependent, by Theorem 7. Conversely, suppose that {u, v, w} 
is linearly dependent. By Theorem 7, some vector in {u, v, w} is a linear combination 
of the preceding vectors (since u ^ 0). That vector must be w, since v is not a multiple 
of u. So w is in Span {u, v}. See Fig. 2. ■ 



Linearly dependent, 
w in Span{u, v} 

FIGURE 2 Linear dependence in R 3 . 


Linearly independent, 
w not in Span{u, v} 
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Example 4 generalizes to any set {u, v, w} in R 3 with u and v linearly independent. 
The set {u, y, w} will be linearly dependent if and only if w is in the plane spanned by 
u and v. 

The next two theorems describe special cases in which the linear dependence of a 
set is automatic. Moreover, Theorem 8 will be a key result for work in later chapters. 


THEOREM 8 If a set contains more vectors than there are entries in each vector, then the set 
p is linearly dependent. That is, any set {vi,..., v^} in W 1 is linearly dependent if 

***** p n. 

/ t 氺氺 氺氺氺 


FIGURE 3 

If p > n, the columns are linearly 
dependent. 


x i 



FIGURE 4 

A linearly dependent set in R 2 . 


PROOF Let A = [\\ ••- ]. Then A is n x p, and the equation Ax = 0 corre¬ 

sponds to a system of n equations in p unknowns. If p > n, there are more variables 
than equations, so there must be a free variable. Hence ^4x = 0 has a nontrivial solution, 
and the columns of A are linearly dependent. See Fig. 3 for a matrix version of this 
theorem. ■ 


Warning: Theorem 8 says nothing about the case in which the number of vectors in 
the set does not exceed the number of entries in each vector. 


EXAMPLE 5 


The vectors 


2 


4 


-2 

1 

5 

-1 

， 

2 


are linearly dependent by Theorem 


8, because there are three vectors in the set and there are only two entries in each vector. 
Notice, however, that none of the vectors is a multiple of one of the other vectors. See 
Fig. 4. ■ 


THEOREM 9 If a set 5 = {vi,..., y^} in W 1 contains the zero vector, then the set is linearly 
dependent. 


PROOF By renumbering the vectors, we may suppose Vi = 0. Then the equation 
lvi + 0v2 + ••• + Ov^ = 0 shows that S is linearly dependent. ■ 

EXAMPLE 6 Determine by inspection if the given set is linearly dependent. 


b. 


SOLUTION 

a. The set contains four vectors, each of which has only three entries. So the set is 
linearly dependent by Theorem 8. 

b. Theorem 8 does not apply here because the number of vectors does not exceed the 
number of entries in each vector. Since the zero vector is in the set, the set is linearly 
dependent by Theorem 9. 

c. Compare the corresponding entries of the two vectors. The second vector seems to 

be —3/2 times the first vector. This relation holds for the first three pairs of entries, 
but fails for the fourth pair. Thus neither of the vectors is a multiple of the other, and 
hence they are linearly independent. ■ 


2 


0 


1 

3 

9 

0 

9 

1 

5 


0 


8 


c. 


4 

6 

in 


-6 

-9 



1 


2 


3 


4 

a. 

7 

9 

0 

9 

1 

9 

1 


6 


9 


5 


8 
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Mastering: Linear 
SG Independence 1-31 


In general, you should read a section thoroughly several times to absorb an 
important concept such as linear independence. The notes in the Study Guide for 
this section will help you learn to form mental images of key ideas in linear algebra. 
For instance, the following proof is worth reading carefully because it shows how the 
definition of linear independence can be used. 

PROOF OF THEOREM 7 (Characterization of Linearly Dependent Sets) 

If some \j in S equals a linear combination of the other vectors, then Vy can be 
subtracted from both sides of the equation, producing a linear dependence relation 
with a nonzero weight (—1) on \j. [For instance, if Vi = C 2 V 2 + C 3 V 3 , then 0 = 

(—l)vi + C 2\2 + C 3\3 -h 0 v 4 H - h Ov^.] Thus S is linearly dependent. 

Conversely, suppose S is linearly dependent. If Vi is zero, then it is a (trivial) 
linear combination of the other vectors in S. Otherwise, Vi 7 ^ 0, and there exist weights 
Ci,, c p ，not all zero, such that 

C\\\ + c 2 \2 H - h c p \ p = 0 

Let j be the largest subscript for which Cj 7 ^ 0. If 7 = 1, then ciVi = 0, which is 
impossible because \\ 7 ^ 0. So j >1, and 

C 1 V 1 H - h Cj\j + 0vy+i H - h Ov 尸 = 0 

Cj\j = —CiVi _ • • • _ Cj — — \ 

V ) = (-3) Vl + … +( - 賢 ) v)- 1 ■ 


PRACTICE PROBLEMS 



3 


-6 


0 


3 

Let u = 

2 

,v = 

1 

,w = 

-5 

,and z = 

7 


-4 


7 


2 


-5 


1. Are the sets {u, v}, {u, w}, {u, z}, {v, w}, {v, z}, and {w, z} each linearly indepen¬ 
dent? Why or why not? 

2. Does the answer to Problem 1 imply that {u, v, w, z} is linearly independent? 

3. To determine if {u, v, w, z} is linearly dependent, is it wise to check if, say, w is a 
linear combination of u, v, and z? 

4. Is {u, v, w, z} linearly dependent? 


1.7 EXERCISES 

In Exercises 1-4, determine if the vectors are linearly indepen 
dent. Justify each answer. 


5 


7 


9 


0 


0 


-1 

0 

, 

2 

, 

4 

2 . 

2 

, 

0 

, 

3 

0 


-6 


-8 


3 


-8 


1 


0-3 9 -4-3 0 


1-4-2 」 [2 1-10 



1 

4 

一 3 

0 " 


1 

-2 

3 

2 " 

7. 

-2 

-7 

5 

1 

8 . 

-2 

4 

—6 

2 


-4 

-5 

7 

5 


0 

1 

-1 

3 


In Exercises 5-8, determine if the columns of the matrix form a 
linearly independent set. Justify each answer. 


In Exercises 9 and 10, (a) for what values of h is V 3 in 
Span{vi, V 2 }, and (b) for what values of h is {vi,V 2 , V 3 } linearly 
dependent! Justify each answer. 
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r 


"-3" 


5" 

9. 

Vl = 

-3 
_ 2_ 

,v 2 = 

9 

—6 

,v 3 = 

-7 
■ h_ 



" 1 " 


"-3" 


2" 

10 . 

Vl = 

-3 

-5 

,v 2 = 

9 

15 

,v 3 = 

-5 

h 


In Exercises 11-14, find the value(s) of h for which the vectors 
are linearly dependent. Justify each answer. 


11 . 

2 " 

-2 

4_ 

， 

4 " 

—6 

7_ 


"- 2 " 

2 

h_ 

12 . 

3" 

—6 

1 _ 


—6 

4 

_-3_ 


" 9 " 
h 

_3_ 

13. 

_ r 

5 

-3 


"- 2 " 

-9 

6 

， 

3" 

h 

-9 

14. 

_ r 
-2 

-4 

， 

"-3" 

7 

6 


• 2 - 
1 

h 


Determine by inspection whether the vectors in Exercises 15-20 
are linearly independent. Justify each answer. 


15. 




5 


0 


-7 

17. 

-3 

, 

0 

, 

2 


-1 


0 


4 


"-8" 


_ 2 



19. 

12 

, 

-3 



-4 


-1 




2" 


"-3" 

16. 

-4 

, 

6 


8 


-12 


18. 

3 

4 ， 

-1 3 

5 ， 5 

5 ] 


1 


-2 


0 

20 . 

4 

, 

5 

, 

0 


-7 


3 


0 


In Exercises 21 and 22, mark each statement True or False. Justify 
each answer on the basis of a careful reading of the text. 


21 . a. The columns of a matrix A are linearly independent if the 

equation Ax = 0 has the trivial solution. 

b. If 5 is a linearly dependent set, then each vector is a linear 
combination of the other vectors in S. 

c. The columns of any 4x5 matrix are linearly dependent. 

d. If x and y are linearly independent, and if {x, y, z} is 
linearly dependent, then z is in Span {x, y}. 

22. a. If u and y are linearly independent, and if w is in 

Span {u, y}, then {u, v, w} is linearly dependent. 

b. If three vectors in R 3 lie in the same plane in R. 3 , then 
they are linearly dependent. 

c. If a set contains fewer vectors than there are entries in the 
vectors, then the set is linearly independent. 

d. If a set in M” is linearly dependent, then the set contains 
more than n vectors. 


In Exercises 23-26, describe the possible echelon forms of the 
matrix. Use the notation of Example 1 in Section 1.2. 

23. yl is a 2 x 2 matrix with linearly dependent columns. 

24. ^4 is a 3 x 3 matrix with linearly independent columns. 


25. ^4 is a 4 x 2 matrix, A = [ai SI 2 ], and a 2 is not a multiple of 
ai. 

26. ^4 is a 4 x 3 matrix, ^4 = [ai SL 2 ^ 3 ], such that {ai, 82 } is 
linearly independent and sl^ is not in Span {ai, a 2 [ 

27. How many pivot columns must a 6 x 4 matrix have if its 
columns are linearly independent? Why? 

28. How many pivot columns must a 4 x 6 matrix have if its 
columns span R 4 ? Why? 

29. Construct 3x2 matrices A and B such that Ax = 0 has a 
nontrivial solution, but Bx = 0 has only the trivial solution. 

30. a. Fill in the blank in the following statement: “If 乂 is 

an m x n matrix, then the columns of A are linearly 
independent if and only if A has_pivot columns.” 

b. Explain why the statement in (a) is true. 


Exercises 31 and 32 should be solved without performing row 
operations. [Hint: Write Ax = 0 as a vector equation.] 


31. Given A 


, observe that the third column 


is the sum of the first two columns. Find a nontrivial solution 
of Ax = 0. 

4 3 -5" 


32. Given A 


observe that the first column 


-2-2 4 

-2-3 7_ 

minus three times the second column equals the third column. 
Find a nontrivial solution of Ax = 0. 


Each statement in Exercises 33-38 is either true (in all cases) 
or false (for at least one example). If false, construct a specific 
example to show that the statement is not always true. Such 
an example is called a counterexample to the statement. If a 
statement is true, give a justification. (One specific example 
cannot explain why a statement is always true. You will have to 
do more work here than in Exercises 21 and 22.) 

33. Ifvi,..., y 4 are in R 4 and y 3 = 2\ { + y 2 , then {vi, V 2 , V 3 , ¥ 4 } 
is linearly dependent. 

34. If Vi and V 2 are in R 4 and ¥2 is not a scalar multiple of Vi, 
then {vi, V 2 } is linearly independent. 

35. If Vi,... ,y 5 are in R 5 and y 3 = 0, then {vi,V 2 , V 3 , V 4 , V 5 } is 
linearly dependent. 

36. If Vi, V 2 , ¥3 are in M 3 and V 3 is not a linear combination of 
Vi,V 2 , then {vi, v 2 , V 3 } is linearly independent. 

37. If Vi,..., V 4 are in R 4 and {vi, \ 2 , V 3 } is linearly dependent, 
then {vi, \ 2 , V 3 , V 4 } is also linearly dependent. 

38. If {v 1 ， ... ， ¥ 4 } is a linearly independent set of vectors in R 4 , 
then {vi,V 2 ,V 3 } is also linearly independent. [Hint: Think 
about Xi\i + x 2 \2 + X 3 V 3 + 0 • y 4 = 0.] 

39. Suppose Aisanm x n matrix with the property that for all b 
in the equation Ax = b has at most one solution. Use the 
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say that multiplication by A transforms x into b and transforms u into the zero vector. 
See Fig. 1. 


definition of linear independence to explain why the columns 
of A must be linearly independent. 

40. Suppose an m x n matrix A has n pivot columns. Explain 
why for each b in R m the equation Ax = b has at most one 
solution. [Hint: Explain why Ax = b cannot have infinitely 
many solutions.] 

[M] In Exercises 41 and 42, use as many columns of A as possible 
to construct a matrix B with the property that the equation Bx = 0 
has only the trivial solution. Solve Bx = 0 to verify your work. 


41. A 


3 

-4 

10 

7 

-4 

■5 

-3 

-7 

-11 

15 

4 

3 

5 

2 

1 

8 

-7 

23 

4 

15 


12 

10 

-6 

8 

4 

-14 

-7 

—6 

4 

—5 

-7 

9 

9 

9 

-9 

9 

9 

-18 

-4 

-3 

-1 

0 

-8 

1 

8 

7 

-5 

6 

1 

—11 


42. A : 


43. [M] With A and B as in Exercise 41, select a column v of Z 
that was not used in the construction of B and determine if 
y is in the set spanned by the columns of B. (Describe your 
calculations.) 

44. [M] Repeat Exercise 43 with the matrices A and B from 
Exercise 42. Then give an explanation for what you discover, 
assuming that B was constructed as specified. 


^3 



SOLUTIONS TO PRACTICE PROBLEMS 


1. Yes. In each case, neither vector is a multiple of the other. Thus each set is linearly 
independent. 

2. No. The observation in Practice Problem 1, by itself, says nothing about the linear 
independence of {u, y, w, z}. 

3. No. When testing for linear independence, it is usually a poor idea to check if one 
selected vector is a linear combination of the others. It may happen that the selected 
vector is not a linear combination of the others and yet the whole set of vectors is 
linearly dependent. In this practice problem, w is not a linear combination of u, y, 
and z. 

4. Yes, by Theorem 8. There are more vectors (four) than entries (three) in them. 


1.8 INTRODUCTION TO LINEAR TRANSFORMATIONS 


The difference between a matrix equation ^4x = b and the associated vector equation 
X\ 2 i\ + • • • + x n 2 i n = b is merely a matter of notation. However, a matrix equation 
Ax = b can arise in linear algebra (and in applications such as computer graphics and 
signal processing) in a way that is not directly connected with linear combinations of 
vectors. This happens when we think of the matrix A as an object that “acts” on a vector 
x by multiplication to produce a new vector called Ax. 

For instance, the equations 





3 0 
- 

4 2 



























1.8 Introduction to Linear Transformations 63 


multiplication 



by A 



multiplication 



by A 



FIGURE 1 Transforming vectors via matrix 
multiplication. 


From this new point of view, solving the equation Ax = b amounts to finding 
all vectors x in M 4 that are transformed into the vector b in R 2 under the “action” of 
multiplication by A. 

The correspondence from x to Ax is 认 function from one set of vectors to another. 
This concept generalizes the common notion of a function as a rule that transforms one 
real number into another. 

A transformation (or function or mapping) T from W 1 to R m is a rule that assigns 
to each vector x in R w a vector T (x) in R m . The set W 2 is called the domain of T, and 
W 71 is called the codomain of T. The notation T '.W 1 ^ R m indicates that the domain 
of T is R n and the codomain is R m . For x in R' the vector T (x) in R m is called the 
image of x (under the action of T). The set of all images T (x) is called the range of T. 
See Fig. 2. 



FIGURE 2 Domain, codomain, and range 
of r : IT 


The new terminology in this section is important because a dynamic view of 
matrix-vector multiplication is the key to understanding several ideas in linear algebra 
and to building mathematical models of physical systems that evolve over time. Such 
dynamical systems will be discussed in Sections 1.10, 4.8, and 4.9 and throughout 
Chapter 5. 


Matrix Transformations 

The rest of this section focuses on mappings associated with matrix multiplication. For 
each x in W 1 , T (x) is computed as Ax, where Aisanmxn matrix. For simplicity, we 
sometimes denote such a matrix transformation by x Ax. Observe that the domain 
of T is when A has n columns and the codomain of T is when each column of 
A has m entries. The range of T is the set of all linear combinations of the columns of 
A, because each image T (x) is of the form Ax. 
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EXAMPLE 1 Let A = 

1 -3" 
3 5 

, u = 

2 

i 

， b = 

3" 

2 

, c = 

"3" 

2 


-1 7 


—丄 


-5 


5 


define a transformation T : R 2 ^ R 3 by 7" (x) = Ax, so that 


and 



1 -3" 

「 ] 


X\ - 3x 2 

T(x) = Ax = 

3 5 

-1 7 

_x 2 _ 

= 

3xi -h 5x 2 
—X\ + 1x2 


a. Find T (u), the image of u under the transformation T . 

b. Find an x in M 2 whose image under T is b. 

c. Is there more than one x whose image under 7" is b? 

d. Determine if c is in the range of the transformation T. 

SOLUTION 


a. Compute 


b. Solve 7"(x) = b for x. That is, solve Ax = b, or 


=An = 

"1 -3 " 
3 5 

2" 

1 

= 

5" 

1 


-1 7 

- 1 


-9 


1 

3 

-3" 

5 

"^i" 


3" 

2 

-1 

7 

_^2_ 


-5 


⑴ 


Using the method discussed in Section 1.4, row reduce the augmented matrix: 


1 

-3 3" 


"1 

-3 

3" 


"1 

-3 

3" 


"1 

0 

1.5" 

3 

5 2 

〜 

0 

14 

-7 

〜 

0 

1 

— .5 

〜 

0 

1 

— .5 

-1 

7-5 


0 

4 

-2 


0 

0 

0 _ 


0 

0 

0 


-.5, and x 


1.5 
— .5 


⑵ 


.The image of this x under T is the 


Hence x\ = 1.5, X 2 
given vector b. 

c. Any x whose image under r is b must satisfy equation (1). From (2), it is clear that 
equation (1) has a unique solution. So there is exactly one x whose image is b. 

d. The vector c is in the range of T if c is the image of some x in R 2 , that is, if c = T(x) 
for some x. This is just another way of asking if the system Ax = c is consistent. To 
find the answer, row reduce the augmented matrix: 


1 

-3 

3" 


"1 

-3 

3" 


" 1 

-3 

3" 


"1 

-3 

3" 

3 

5 

2 

〜 

0 

14 

-7 

〜 

0 

1 

2 

〜 

0 

1 

2 

-1 

7 

5 


0 

4 

8 


0 

14 

-7 


0 

0 

-35 


The third equation, 0 
the range of T. 


-35, shows that the system is inconsistent. So c is not in 

■ 


The question in Example 1(c) is a uniqueness problem for a system of linear 
equations, translated here into the language of matrix transformations: Is b the image 
of a unique x in R 71 ? Similarly, Example 1(d) is an existence problem: Does there exist 
an x whose image is c? 

The next two matrix transformations can be viewed geometrically. They reinforce 
the dynamic view of a matrix as something that transforms vectors into other vectors. 
Section 2.7 contains other interesting examples connected with computer graphics. 
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文 3 



FIGURE 3 

A projection transformation. 



sheared sheep 


DEFINITION 


1 0 0 

EXAMPLE 2 If 0 1 0 

_0 0 0 _ 

points in R 3 onto the %iX 2 -plane because 


then the transformation x i-> Ax projects 




"1 

0 

0 一 

又 l 


Xl 

X 2 


0 

1 

0 

^2 

= 

X 2 

x 3 - 


0 

0 

0 

_^3_ 


0 


See Fig. 3. ■ 


EXAMPLE 3 Let A = 


The transformation T : R 2 ^ R 2 defined by 


T (x) = Ax is called a shear transformation. It can be shown that if T acts on each 
point in the 2x2 square shown in Fig. 4， then the set of images forms the shaded 
parallelogram. The key idea is to show that T maps line segments onto line segments 
(as shown in Exercise 27) and then to check that the corners of the square map onto 

the vertices of the parallelogram. For instance, the image of the point u = ^ is 


Hu) 





,and the image of 


"2" 

is 

"1 3" 

"2" 


"8" 

2 

0 1 

2 


2 


T 


deforms the square as if the top of the square were pushed to the right while the base is 
held fixed. Shear transformations appear in physics, geology, and crystallography. ■ 





2 



FIGURE 4 A shear transformation. 


Linear Transformations 

Theorem 5 in Section 1.4 shows that if 4 is m x n, then the transformation x Ax has 
the properties 

A(u + v) = Au + A\ and A(cu) = cAu 

for all u, v in W l and all scalars c. These properties, written in function notation, identify 
the most important class of transformations in linear algebra. 

A transformation (or mapping) T is linear if: 

(i) r(u + v) = r(u) + r(v) for all u, v in the domain of T ; 

(ii) T (cu) = c r(u) for all scalars c and all u in the domain of T. 

Every matrix transformation is a linear transformation. Important examples of 
linear transformations that are not matrix transformations will be discussed in Chapters 
4 and 5. 
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Linear transformations preserve the operations of vector addition and scalar mul¬ 
tiplication. Property (i) says that the result 7"(u + v) of first adding u and v in W 2 and 
then applying T is the same as first applying T to u and to v and then adding T (u) and 
T (y) in R m . These two properties lead easily to the following useful facts. 

If 7" is a linear transformation, then 

7(0) = 0 (3) 

and 

r(cu + d\) = cT(u) + Jr(v) (4) 

for all vectors u, v in the domain of T and all scalars c ， d. 

Property (3) follows from condition (ii) in the definition, because T(0) = T (Ou)= 
Or(u) = 0. Property (4) requires both (i) and (ii): 

T(cu -\- d\) = T(cu) + T(d\) = cT(u) dT(\) 

Observe that if a transformation satisfies (4) for all u, v and c, d ， it must be linear. 
(Set c = d = 1 for preservation of addition, and set d = 0 for preservation of scalar 
multiplication.) Repeated application of (4) produces a useful generalization: 

r(ciVi H - + c p \ p ) = CiT(\i) H - + c p T(y p ) (5) 

In engineering and physics, (5) is referred to as a superposition principle. Think 
of . ,\ p as signals that go into a system and 7"(vi), ... ,T (y^) as the responses of 
that system to the signals. The system satisfies the superposition principle if whenever 
an input is expressed as a linear combination of such signals, the system’s response is 
the same linear combination of the responses to the individual signals. We will return 
to this idea in Chapter 4. 

EXAMPLE 4 Given a scalar r, define T : M 2 ^ R 2 by T(x) = rx. T is called a 
contraction when 0 < r < 1 and a dilation when r > 1. Let r = 3, and show that T 
is a linear transformation. 

SOLUTION Letu, vbeinM 2 and let c, J be scalars. Then 

T (cu + d\) = 3(cu d\) Definition of T 

= 3cu + 3d\ 

, 、 ，/、 > Vector arithmetic 

=c(3u) + d(3\) 

=cT(u) + dT(y) 

Thus r is a linear transformation because it satisfies (4). See Fig. 5. ■ 




FIGURE 5 A dilation transformation. 
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EXAMPLE 5 


Define a linear transformation T : R 2 ^ R 2 by 


T(x) = 


"0 - 1 " 

~ Xi " 


~^2 

1 0 

_ X 2 _ 


_ 


Find the images under 7" of u = 



and u + y = 



SOLUTION 




"0 -1" 

"4" 


"-1" 

1 0 

1 


4 


r ⑺ = 


"0 -1" 

_2_ 


"-3" 

1 0_ 

_3_ 


2 


r (u + y)= 


"0 -1" 

"6" 


'-4" 

■ 1 0 _ 

4 


6 _ 


Note that r(u + v) is obviously equal to r(u) + r(v). It appears from Fig. 6 that 
T rotates u ， v，and u + v counterclockwise about the origin through 90。. In fact, T 
transforms the entire parallelogram determined by u and v into the one determined by 
7"(u) and r(v). (See Exercise 28.) ■ 


x i 


r(u + v) 


T(u) 




H~~I~~I~~I~~h 




H~~I~~I~~I~~I~~h 


FIGURE 6 A rotation transformation. 


The final example is not geometrical; instead, it shows how a linear mapping can 
transform one type of data into another. 


EXAMPLE 6 A company manufactures two products, B and C. Using data from 
Example 7 in Section 1.3，we construct a “unit cost” matrix, U = [b c], whose 
columns describe the “costs per dollar of output” for the products: 


Product 



B 

C 



■•45 

.40" 

Materials 

U = 

.25 

.35 

Labor 


• 15 

• 15 

Overhead 


Let x = (xi, X 2 ) be a “production” vector, corresponding to X\ dollars of product B and 
X 2 dollars of product C, and define T : R 2 —> R 3 by 


T(x) = Ux = X\ 

".45" 

.25 

+ X2 

".40" 

.30 

_ 

Total cost of materials 
Total cost of labor 


.15 


• 15 


Total cost of overhead 


The mapping T transforms a list of production quantities (measured in dollars) into a 
list of total costs. The linearity of this mapping is reflected in two ways: 

1. If production is increased by a factor of, say, 4, from x to 4x, then the costs will 
increase by the same factor, from T (x) to AT (x). 
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2. If x and y are production vectors, then the total cost vector associated with the 
combined production x + y is precisely the sum of the cost vectors T (x) and 

TXy). ■ 

PRACTICE PROBLEMS 

1. Suppose T : R 5 ^ R 2 and T (x) = Ax for some matrix A and for each x in R 5 . How 
many rows and columns does A have? 

2. Let A = }. ^ . Give a geometric description of the transformation x Ax. 

U — 1 

3. The line segment from 0 to a vector u is the set of points of the form tu, where 
0 < f < 1. Show that a linear transformation T maps this segment into the segment 
between 0 and r(u). 


1.8 EXERCISES 


Let A - 

Find the images under 7" of u 


,and define T : R 2 — R 2 by T(x) = v4x. 
and y = 


-3 


10. A 


3 2 10 

1 0 2 

0 1 2 

1 4 10 


2. Let A = 

■| 0 o' 
0 | 0 

,u = 

3" 

6 

,and y = 

a 

b 

11. Let b = 

"-1" 

1 


0 0 i 


-9 


c 


0 


Define T : R 3 —> E 3 by T(x) = Ax. Find T (u) and T(v). 

In Exercises 3-6, with T defined by T (x) = Ax, find a vector x 
whose image under T is b, and determine whether x is unique. 



1 

0 

-3" 


"-2" 

3. A = 

-3 

1 

6 

,b = 

3 


2 

-2 

-1 


-1 


4. A ： 


5. A : 


,b 


,b 



"1 

-3 

2 ~ 


r 

6, A = 

3 

0 

-8 

1 

8 

2 

,b = 

6 

3 


_ 1 

0 

8 


10 


7. Let ^4 be a 6 x 5 matrix. What must a and b be in order to 

define T : ^ R 6 by T (x) = Ax? 

8 . How many rows and columns must a matrix A have in order 
to define a mapping from M 5 into M 7 by the rule T (x) = Axl 

For Exercises 9 and 10, find all x in R 4 that are mapped into the 
zero vector by the transformation x i-^- Ax for the given matrix A. 


9. A 


-3 5 -5 

1 -3 5 

-4 4 -4 


,and let A be the matrix in Exercise 9. Is b 


in the range of the linear transformation x Axl Why or 
why not? 


12. Let b 


4 


,and let A be the matrix in Exercise 10. Is 

Axl Why or 


b in the range of the linear transformation x 
why not? 

In Exercises 13-16, use a rectangular coordinate system to plot 
,and their images under the given transfor¬ 


_ 5_ 


'-2" 

_2_ 

,v — 

4 


mation T. (Make a separate and reasonably large sketch for each 
exercise.) Describe geometrically what T does to each vector x 
in R 2 . 


13. T(x) 

14. T(x) 

15. T(x) 


又 i 
义2 


16. T(x) 

17. Let T : R 2 

^ into 
4 


A 

xi 


M 2 be a linear transformation that maps u : 


4 


and maps y : 


into 


.Use the fact 


that T is linear to find the images under T of 2u, 3v, and 
2u + 3v. 
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18. The figure shows vectors u, y, and w, along with the images 
T (u) and T (v) under the action of a linear transformation 
r : R 2 — R 2 . Copy this figure carefully, and draw the image 
T (w) as accurately as possible. [Hint: First, write w as a 
linear combination of u and v.] 


x i 


X 2 



X 1 


19. Letei = 
let T : R 2 


0 


e 2 : 


yi 



and 


R 2 be a linear transformation that maps ei into 


y l and maps e 2 into y 2 . Find the images of 
-3 


and 


A 

x 2 


20. Let x 


叉2 


Vl 


and V 2 


and let 


T :] R 2 —> R 2 be a linear transformation that maps x into 
X\\\ + X 2 V 2 . Find a matrix A such that T (x) is Ax for each x. 

In Exercises 21 and 22, mark each statement True or False. Justify 
each answer. 


21. a. A linear transformation is a special type of function. 

b. If y4 is a 3 x 5 matrix and T is a transformation defined 
by T (x) = Ax, then the domain of T is R 3 . 

c. If ^4 is an m x « matrix, then the range of the transforma¬ 
tion x ylx is R m . 

d. Every linear transformation is a matrix transformation. 

e. A transformation T is linear if and only if 
T{c x \ x + c 2 \ 2 ) = CiT^x) + c 2 T(\ 2 ) 

for all Vi and V 2 in the domain of T and for all scalars C\ 
and C 2 . 

22. a. The range of the transformation x 1 -^- Ax is the set of all 

linear combinations of the columns of A. 

b. Every matrix transformation is a linear transformation. 

c. If r : R n —^ R m is a linear transformation and if c is in 
R m , then a uniqueness question is “Is c in the range of 
TT 

d. A linear transformation preserves the operations of vector 
addition and scalar multiplication. 

e. A linear transformation T : R” — R m always maps the 
origin of to the origin of R m . 

23. Define / : R —> R by f(x) = mx + b. 

a. Show that / is a linear transformation when b = 0. 

b. Find a property of a linear transformation that is violated 
when Z? 一 0. 

c. Why is / called a linear function? 


24. An affine transformation 7 : R n —^ R m has the form T (x)= 
Ax + b, with A on m x n matrix and b in IR m . Show 
that T is not a linear transformation when b _ 0. (Affine 
transformations are important in computer graphics.) 

25. Given v 一 0 and p in R”，the line through p in the direction of 
y has the parametric equation x = p + ?y. Show that a linear 
transformation T : R n — maps this line onto another line 
or onto a single point (a degenerate line). 

26. a. Show that the line through vectors p and q in R” may be 

written in the parametric form x = (1 — ^)p + tq. (Refer 
to the figure with Exercises 21 and 22 in Section 1.5.) 

b. The line segment from p to q is the set of points of the 
form (1 — f)p + ?q for 0 < ? < 1 (as shown in the figure 
below). Show that a linear transformation T maps this 
line segment onto a line segment or onto a single point. 

a = 0)p T(q) 

(l- 0 p +，q 

\ T(p) 

(f=l)q 

27. Let u and y be linearly independent vectors in R 3 , and let P 
be the plane through u, y, and 0. The parametric equation 
of P is x = 5 U + fv (with s, t in R). Show that a linear 
transformation T : R 3 —> R 3 maps P onto a plane through 0, 
or onto a line through 0, or onto just the origin in R 3 . What 
must be true about T (u) and T (y) in order for the image of 
the plane 尸 to be a plane? 

28. Let u and v be vectors in R w . It can be shown that the set P of 
all points in the parallelogram determined by u and y has the 
form au + b\, for 0 < a < 1, 0 < Z? < 1. Let T :R n 

be a linear transformation. Explain why the image of a point 
in P under the transformation T lies in the parallelogram 
determined by r(u) and T(v). 

29. Let T :] R 2 — R 2 be the linear transformation that reflects 
each point through the 々 -axis. Make two sketches similar 
to Fig. 6 that illustrate properties (i) and (ii) of a linear 
transformation. 

30. Suppose vectors Vi,...,span R”，and let T :] 4 R n 
be a linear transformation. Suppose T(\i) = 0 for i = 

, p. Show that T is the zero transformation. That is, 
show that if x is any vector in R n , then 7"(x) = 0. 

31. Let T : W 1 be a linear transformation, and let 

{vi, V 2 ,¥ 3 } be a linearly dependent set in Explain why 
the set (r(vi), T(\ 2 ), 7 "(V 3 )} is linearly dependent. 



In Exercises 32-36, column vectors are written as rows, such as 
x = (xi, X 2 ), and T (x) is written as T(x\, X 2 ). 

32. Show that the transformation T defined by T(x\,X 2 )= 
(xi — 2 |x 2 |, xi — 4x2) is not linear. 

33. Show that the transformation T defined by T(x\,X 2 )= 
(xi — 2x 2 , Xi — 3, 2x\ — 5x 2 ) is not linear. 
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34. Let T : R 3 ^ R 3 be the transformation that reflects each 
vector x = (Xi,X 2 ,X 3 ) through the plane X 3 = 0 onto 
T(x) = (x\,X 2 , —X 3 ). Show that r is a linear transforma¬ 
tion. [See Example 4 for ideas.] 

35. Let T : M 3 — R 3 be the transformation that projects each 
vector x = (xi,X 2 ,X 3 ) onto the plane X 2 = 0, so T(x)= 
(xi, 0, X 3 ). Show that T is a linear transformation. 


36. Let 7" : — R m be a linear transformation. Suppose {u, y} 

is a linearly independent set, but {T(u), 7"(y)} is a linearly 
dependent set. Show that T(x) = 0 has a nontrivial solution. 
[Hint: Use the fact that C\T(u) + ^^(y) = 0 for some 
weights C\ and C 2 , not both zero.] 


[M] In Exercises 37 and 38, the given matrix determines a linear 
transformation T. Find all x such that T (x) = 0. 


37. 


39. 


2 

3 

5 

-5' 


'3 

4 

-7 

0 " 

-7 

7 

0 

0 

38. 

5 

-8 

7 

4 

-3 

4 

1 

3 

6 

-8 

6 

4 

-9 

3 

—6 

-4 


9 

-7 

-2 

0 


[M] Let b = 5 and let A be the matrix in Exercise 37. 

_-3_ 

Is b in the range of the transformation x i-^- Ax? If so, find 
an x whose image under the transformation is b. 


40. [M] Let b 


and let A be the matrix in Exercise 38. 


Is b in the range of the transformation x Axl If so, find 
an x whose image under the transformation is b. 


SG Mastering: Linear Transformations 1-34 


x i 


V - 

參 

Au 

參 

_ t 

X 

• 

I 

Av ' 



Ax 




The transformation x Ax. 


SOLUTIONS TO PRACTICE PROBLEMS 


1. A must have five columns for Ax to be defined. A must have two rows for the 
codomain of T to be R 2 . 

2. Plot some random points (vectors) on graph paper to see what happens. A point such 
as (4,1) maps into (4,-1). The transformation x\-^ Ax reflects points through the 
x-axis (or xi-axis). 

3. Let x = tu for some t such that 0 < ? < 1. Since T is linear, T(tu) = t T (u), which 
is a point on the line segment between 0 and T (u). 


1.9 THE MATRIX OF A LINEAR TRANSFORMATION 


Whenever a linear transformation T arises geometrically or is described in words, we 
usually want a “formula” for 7"(x). The discussion that follows shows that every linear 
transformation from to R m is actually a matrix transformation x \-^ Ax and that 
important properties of T are intimately related to familiar properties of A. The key to 
finding A is to observe that T is completely determined by what it does to the columns 
of the n x n identity matrix I n . 


X 2 

"ol 

e 2= 1 


e 



EXAMPLE 1 The columns of I 2 


are ei 


and e 2 


pose 7" is a linear transformation from R 2 into R 3 such that 


.Sup- 


^(e,)= 

5" 

-7 

and r(e 2 )= 

"-3~ 

8 


2 


0 


With no additional information, find a formula for the image of an arbitrary x in R 2 . 
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SOLUTION Write 


xi 

X2 


Xi 


+ X2 


x\t\ + x 2 e 2 


Since 7" is a linear transformation, 

r(x) = ^iT(ei) + x 2 T(e 2 ) 

=Xi 


5" 


"-3" 


5x\ — 3x2 

-7 

X 2 

8 

= 

— 1 X\ 8 x 2 

2 


0 


-J- 0 


⑴ 


⑵ 


■ 


The step from equation (1) to equation (2) explains why knowledge of r(ei) and 
T (e 2 ) is sufficient to determine T (x) for any x. Moreover, since (2) expresses T (x) as 
a linear combination of vectors, we can put these vectors into the columns of a matrix 
A and write (2) as 


T(x) = [ r( ei ) r(e 2 )] 


A 

x 2 


Ax 


THEOREM 10 Let T:R n be a linear transformation. Then there exists a unique matrix 

A such that 

T(x) = Ax for all x in M /7 

In fact, A is the m x n matrix whose jth column is the vector T (ej), where e 7 is 
the Jth column of the identity matrix in R 71 : 

z = [r( ei ) ... r(e„)] ⑶ 


PROOF Write x = I n x = [ei … e n ]x = xiei + ••• + x n e n , and use the linearity 
of T to compute 

T(x) = T(x x ei H - + x n e n ) = xiT(e x ) H - + x n T(e n ) 


= [r( ei ) … 


r(e„)] : = Ax 


x n 


The uniqueness of A is treated in Exercise 33. 


■ 


The matrix A in (3) is called the standard matrix for the linear transforma¬ 
tion T. 

We know now that every linear transformation from to can be viewed as 
a matrix transformation, and vice versa. The term linear transformation focuses on a 
property of a mapping, while matrix transformation describes how such a mapping is 
implemented, as Examples 2 and 3 illustrate. 

EXAMPLE 2 Find the standard matrix A for the dilation transformation T (x) = 3x, 
for x in R 2 . 
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SOLUTION Write 


^(eO = 3d 


and T (e 2 ) = 3e 2 


A 


3 0 

0 3 


■ 


EXAMPLE 3 Let T : R 2 4 R 2 be the transformation that rotates each point in 
R 2 about the origin through an angle (p, with counterclockwise rotation for a positive 
angle. We could show geometrically that such a transformation is linear. (See Fig. 6 in 
Section 1.8.) Find the standard matrix A of this transformation. 


SOLUTION 



rotates into 


By Theorem 10, 


cosp 

sin^ 


and 



rotates into 


— sin 炉 

COS (p 


See Fig. 1. 


cos cp — sin p 
sirup cos (p 


Example 5 in Section 1.8 is a special case of this transformation, with (p = 7r/2. 


(-sin (p, cos cp)< 一 

1(0,1) 

一、 

/ 

/ w 

1 

\ 

\ 

^^、(cos cp，sin cp) 

1 


\ 

/(i ， o) 


\ / 
FIGURE 1 A rotation transformation. 


Geometric Linear Transformations of R 2 



E 

FIGURE 2 

The unit square. 


Examples 2 and 3 illustrate linear transformations that are described geometrically. 
Tables 14 illustrate other common geometric linear transformations of the plane. 
Because the transformations are linear, they are determined completely by what they 
do to the columns of I 2 . Instead of showing only the images of ei and e〗，the tables 
show what a transformation does to the unit square (Fig. 2). 

Other transformations can be constructed from those listed in Tables 1-4 by 
applying one transformation after another. For instance, a horizontal shear could be 
followed by a reflection in the ^ 2 -axis. Section 2.1 will show that such a composition 
of linear transformations is linear. (Also, see Exercise 34.) 


Existence and Uniqueness Questions 

The concept of a linear transformation provides a new way to understand the existence 
and uniqueness questions asked earlier. The two definitions following Tables \-A give 
the appropriate terminology for transformations. 
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TABLE 1 Reflections 

Transformation 

Reflection through 
the xi-axis 


Reflection through 
the X 2 -axis 


Image of the Unit Square 




Standard Matrix 





0 




Reflection through 
the line X2 = —X\ 




Reflection through 
the origin 
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TABLE 2 Contractions and Expansions 

Transformation Image of the Unit Square 

Horizontal x 2 a* 2 


contraction 
and expansion 





k>\ 


X x 


Standard Matrix 


k 0 
0 1 


Vertical 
contraction 
and expansion 


x i 



又 2 r i o 



TABLE 3 Shears 

Transformation Image of the Unit Square Standard Matrix 

Horizontal shear x 2 x 2 「 1 A: _ 



\ 

0 

P - 7 

111 l\= 

- \ . _ __ 




A=(= ^=/ 











.. 


- 1 - 

k 

• 又 i 

H 

k<0 

k>0 


Vertical shear 



-^2 10 
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DEFINITION 


TABLE 4 Projections 

Transformation Image of the Unit Square 

Projection onto x 2 

the xi-axis 


Standard Matrix 


1 0 
0 0 


Projection onto 
the ^-axis 



x i 




A mapping T \R n ^ R w is said to be onto R m if each b in R m is the image of 
at least one x in M w . 


Equivalently, T is onto M. m when the range of T is all of the codomain That is, 
T maps W l onto R m if, for each b in the codomain W n , there exists at least one solution 
of T (x) = b. “Does 7" map MJ 1 onto R w ?” is an existence question. The mapping T is 
not onto when there is some b in for which the equation 7"(x) = b has no solution. 
See Fig. 3. 



ris not onto U m Tis onto IR m 


FIGURE 3 Is the range of T all of R m ? 


A mapping T '.W 1 ^ is said to be one-to-one if each b in is the image 
of at most one x in . 


DEFINITION 
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Equivalently, T is one-to-one if, for each b in R m , the equation 7"(x) = b has either 
a unique solution or none at all. “Is T one-to-one?” is a uniqueness question. The 
mapping T is not one-to-one when some b in M m is the image of more than one vector 
in W l . If there is no such b, then T is one-to-one. See Fig. 4. 




FIGURE 4 Is every b the image of at most one vector? 

Mastering: Existence The projection transformations shown in Table 4 are not one-to-one and do not map 

and Uniqueness 1-39 R 2 onto R 2 . The transformations in Tables 1,2, and 3 are one-to-one and do map R 2 
onto R 2 . Other possibilities are shown in the two examples below. 

Example 4 and the theorems that follow show how the function properties of being 
one-to-one and mapping onto are related to important concepts studied earlier in this 
chapter. 


EXAMPLE 4 Let T be the linear transformation whose standard matrix is 

"i -4 8 r 

A= 0 2-1 3 

0 0 0 5 

Does T map R 4 onto R 3 ? Is T a one-to-one mapping? 

SOLUTION Since A happens to be in echelon form, we can see at once that A has a 
pivot position in each row. By Theorem 4 in Section 1.4, for each b in R 3 , the equation 
Ax = b is consistent. In other words, the linear transformation T maps R 4 (its domain) 
onto R 3 . However, since the equation Ax = b has a free variable (because there are 
four variables and only three basic variables), each b is the image of more than one x. 
That is, T is not one-to-one. ■ 


THEOREM 11 Let 7" : - R m be a linear transformation. Then T is one-to-one if and only if 

the equation r(x) = 0 has only the trivial solution. 


PROOF Since T is linear, 7"(0) = 0. If T is one-to-one, then the equation 7"(x) = 0 
has at most one solution and hence only the trivial solution. If T is not one-to-one, then 
there is a b that is the image of at least two different vectors in W l — say, u and y. That 
is, T (u) = b and T (y) = b. But then, since T is linear, 

T(u -v) = T(u)- T(y) = b-b = 0 

The vector u — v is not zero, since u 7 ^ y. Hence the equation T(x) = 0 has more than 
one solution. So, either the two conditions in the theorem are both true or they are both 
false. ■ 
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THEOREM 12 Let T :M, n —>■ R m be a linear transformation and let A be the standard matrix for 
T. Then: 

a. T maps R n onto if and only if the columns of A span R m ; 

b. T is one-to-one if and only if the columns of A are linearly independent. 


PROOF 

a. By Theorem 4 in Section 1.4, the columns of A span if and only if for each b 
in R 771 the equation Ax = b is consistent—in other words, if and only if for every b, 
the equation 7"(x) = b has at least one solution. This is true if and only if T maps 
R n onto R m . 

b. The equations T(x) = 0 and Ax = 0 are the same except for notation. So, by 

Theorem 11, T is one-to-one if and only if Ax = 0 has only the trivial solution. 
This happens if and only if the columns of A are linearly independent, as was already 
noted in the boxed statement (3) in Section 1.7. ■ 


x i 



The transformation T is not 
onto R 3 . 


Statement (a) in Theorem 12 is equivalent to the statement “T maps W 1 onto R m 
if and only if every vector in W n is a linear combination of the columns of See 
Theorem 4 in Section 1.4. 

In the next example and in some exercises that follow, column vectors are written in 
rows, such as x = (xi, X 2 ), and T (x) is written as T{x\, x^) instead of the more formal 
T{{X\,X2)). 


EXAMPLE 5 Let T{x\,X 2 ) = (3xi + X 2 , 5x\ + 7x2, x\ + 3_X2). Show that 7 is a 
one-to-one linear transformation. Does T map R 2 onto R 3 ? 


SOLUTION When x and T(x) are written as column vectors, you can determine the 
standard matrix of T by inspection, visualizing the row-vector computation of each 
entry in Ax. 



3x\ + x 2 


_ ? ?_ 

「 n 


"3 r 

「 

T(x)= 

5x\ + lx 2 

X\ + 3X2 

_ 

? ? 

? ? 

A 

又 1 

_^2_ 

_ 

5 7 

1 3 

X\ 

_X 2 _ 


⑷ 


So T is indeed a linear transformation, with its standard matrix A shown in (4). The 
columns of A are linearly independent because they are not multiples. By Theorem 
12(b), T is one-to-one. To decide if T is onto R 3 , examine the span of the columns of 
A. Since ^4 is 3 x 2, the columns of A span R 3 if and only if A has 3 pivot positions, 
by Theorem 4. This is impossible, since A has only 2 columns. So the columns of A do 
not span R 3 , and the associated linear transformation is not onto R 3 . ■ 


PRACTICE PROBLEM 

Let T : M 2 ^ R 2 be the transformation that first performs a horizontal shear that maps 
e 2 into e 2 — .5ei (but leaves ei unchanged) and then reflects the result through the x^- 
axis. Assuming that T is linear, find its standard matrix. [Hint: Determine the final 
location of the images of ei and e 2 .] 
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1.9 EXERCISES 


In Exercises 1-10, assume that T is a linear transformation. Find 
the standard matrix of T . 

1. r : K 2 ^ E 4 , r(ei) = (3,1,3,1), and T(e 2 ) = (-5,2,0,0), 
where ei = (1,0) and e 〗 =(0,1). 

2. r : R 3 — R 2 ， r(e,) = (1,4), r(e 2 ) = (-2, 9), and 

r(e 3 ) = (3, —8), where ei, e 2 , and e 3 are the columns of 
the 3x3 identity matrix. 

3. T : R 2 —^ R 2 is a vertical shear transformation that maps ei 
into ei — 3e2, but leaves e 2 unchanged. 

4. T : R 2 R 2 is a horizontal shear transformation that leaves 
ei unchanged and maps e 2 into e 2 + 2ei. 


5. T : R 2 —^ R 2 rotates points (about the origin) through 丌 /2 
radians (counterclockwise). 

6. T : R 2 M 2 rotates points (about the origin) through 
—3 tt/ 2 radians (clockwise). 

7. 7" : R 2 — R 2 first rotates points through — 3 丌 /4 radians 
(clockwise) and then reflects points through the horizontal 
jc r axis. [Hint: r(eO = (-1/V2, 1/V2).] 

8. T : R 2 —^ R 2 first performs a horizontal shear that trans¬ 
forms e 2 into e 2 + 2ei (leaving e! unchanged) and then re¬ 
flects points through the line — —X\. 

9. T : R 2 —^ R 2 first reflects points through the horizontal X\- 
axis and then rotates points —n/2 radians. 

10. T : R 2 —^ R 2 first reflects points through the horizontal x\- 
axis and then reflects points through the line = X\. 

11. A linear transformation T : R 2 R 2 first reflects points 
through the xi-axis and then reflects points through the X 2 - 
axis. Show that T can also be described as a linear transfor¬ 
mation that rotates points about the origin. What is the angle 
of that rotation? 

12. Show that the transformation in Exercise 10 is merely a 
rotation about the origin. What is the angle of the rotation? 

13. Let T : R 2 —^ R 2 be the linear transformation such that r(ei) 
and r(e 2 ) are the vectors shown in the figure. Using the 
figure, sketch the vector T{2, 1). 


r( ei ) r (e 2 ) 

--、 


14. Let T : R 2 — M 2 be a linear transformation with standard 
matrix A = [si\ si 2 ], where a 1 and a 2 are shown in the 
figure at the top of column 2. Using the figure, draw the 

image of under the transformation T. 

_L 



In Exercises 15 and 16, fill in the missing entries of the matrix, 
assuming that the equation holds for all values of the variables. 



"? ? 9" 

■ 


2xi — 4x2 

15. 

? ? ? 

x 2 

二 

X\ — 


? ? ? 

X3 


_ X 2 3^3 


16. ? 

? 


「 -n 


3x\ — 2^2 

Xi 

[_X 2 _ 

= 

X\ + 4^2 



_ _ 


In Exercises 17-20, show that r is a linear transformation by 
finding a matrix that implements the mapping. Note that xi,X 2 ,... 
are not vectors but are entries in vectors. 


17. T(xi,x 2 ,X 3 ,x 4 ) = (Xi + 2x 2 , 0, 2x2 + x 2 — x 4 ) 

18. T (xi, X 2 ) = (x\ + 4^2, 0 , x\ — 3x2, xi) 

19. r(Xi ， X2 ， X3) = (Xi — 5X2 + 4X3, X 2 — 6 X 3 ) 

20. T(xi,x 2 , ^ 3 ,^ 4 ) = 3xi + 4x 3 — 2x 4 (Notice: T : R 4 — R) 

21. Let 7" : R 2 —M 2 be a linear transformation such that 
T{x\,X 2 ) = (xi + X 2 , 4xi + 5 x 2 ). Find x such that r(x)= 
(3,8). 

22. Let r : R 2 — R 3 be a linear transformation with 
T(xi,X 2 ) = (2x\ — X 2 , —3xi + X 2 ,2xi — 3 又 2 ). Find x such 
that T(x) = (0,-1,-4). 


In Exercises 23 and 24, mark each statement True or False. Justify 
each answer. 

23. a. A linear transformation T : R n —> W n is completely de¬ 

termined by its effect on the columns of the n x n identity 
matrix. 

b. If r : R 2 —^ R 2 rotates vectors about the origin through 
an angle (p, then T is a linear transformation. 

c. When two linear transformations are performed one after 
another, the combined effect may not always be a linear 
transformation. 

d. A mapping r : R n —> R m is onto R m if every vector x in 
R n maps onto some vector in R m . 

e. If yl is a 3 x 2 matrix, then the transformation x i-^- Ax 
cannot be one-to-one. 

24. a. If 乂 is a 4 x 3 matrix, then the transformation x i-^- Ax 

maps R 3 onto R 4 . 
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b. Every linear transformation from R” to R m is a matrix 
transformation. 


c. The columns of the standard matrix for a linear transfor¬ 
mation from V to are the images of the columns of 
the n x n identity matrix under T. 

d. A mapping T : R n — is one-to-one if each vector in 

maps onto a unique vector in IR m . 


e. The standard matrix of a horizontal shear transformation 

’ a (T 


from R 2 to R 2 has the form 
are 士 1. 


0 d 


where a and d 


In Exercises 25-28, determine if the specified linear transforma¬ 
tion is (a) one-to-one and (b) onto. Justify each answer. 

25. The transformation in Exercise 17 

26. The transformation in Exercise 2 

27. The transformation in Exercise 19 
2$. The transformation in Exercise 14 


In Exercises 29 and 30, describe the possible echelon forms of the 
standard matrix for a linear transformation T. Use the notation of 
Example 1 in Section 1.2. 

29. T : M 3 — R 4 is one-to-one. 30. T : R 4 R 3 is onto. 

31. Let r : R” 一 ^ be a linear transformation, with A its 

standard matrix. Complete the following statement to make 

it true: “7" is one-to-one if and only if A has_pivot 

columns.” Explain why the statement is true. [Hint: Look 
in the exercises for Section 1.7.] 

32. Let T : R n ^ R m be a linear transformation, with A its 
standard matrix. Complete the following statement to make 

it true: “T maps onto R m if and only if A has_ 

pivot columns.” Find some theorems that explain why the 
statement is true. 


33. Verify the uniqueness of A in Theorem 10. Let r : R n —^ R m 
be a linear transformation such that 7"(x) = Bx for some 
m xn matrix B. Show that if A is the standard matrix for 
T, then A = B. [Hint: Show that A and B have the same 
columns.] 

34. Let and T '.W 1 ^ R m be linear transforma¬ 

tions. Show that the mapping x i-^- T(S(x)) is a linear trans¬ 
formation (from R p to R m ). [Hint: Compute T(S(cu + d\)) 
for u,v in M. p and scalars c and d. Justify each step of 
the computation, and explain why this computation gives the 
desired conclusion.] 

35. If a linear transformation T '.W 1 ^ R m maps onto 
can you give a relation between m and n? If T is one-to-one, 
what can you say about m and nl 

36. Why is the question “Is the linear transformation T onto?” 
an existence question? 

[M] In Exercises 37-40, let T be the linear transformation whose 
standard matrix is given. In Exercises 37 and 38, decide if T is 
a one-to-one mapping. In Exercises 39 and 40, decide if T maps 
R 5 onto R 5 . Justify your answers. 


37. 

"-5 

8 

6 

3 

-5 

-3 

—6 

8 


38. 

■ 7 

5 

5 

6 

9 

4 

-9" 

-4 

2 

9 

5 

-12 


4 

8 

0 

7 


_-3 

2 

7 

-12_ 



-6 

—6 

6 

5_ 


4 

-7 

3 

7 

5" 







6 

-8 

5 

12 

-8 






39. 

-7 

10 

一 8 

一 9 

14 







3 

-5 

4 

2 

-6 







_-5 

6 

—6 

-7 

3_ 







9 

43 

5 

6 

-1" 







14 

15 

-7 

-5 

4 






40. 

-8 

—6 

12 

-5 

-9 







-5 

—6 

-4 

9 

8 







13 

14 

15 

3 

11 







SOLUTION TO PRACTICE PROBLEM 


Follow what happens to ei and e 2 . See Fig. 5. First, ei is unaffected by the shear and 
then is reflected into —ei. So T(ei) = —ei. Second, e 2 goes to e 2 — .5ei by the shear 



Shear transformation Reflection through the x 2 -axis 


FIGURE 5 The composition of two transformations. 
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transformation. Since reflection through the X 2 -axis changes ei into —ei and leaves 
e 2 unchanged, the vector e 2 — -5ei goes to ^2 + .5ei. So T(e 2 ) = h + .5e\. Thus the 
standard matrix of T is 

[r( ei ) r(e 2 )] = [- ei e 2 + .5 ei ] = r _ J J 


1.10 LINEAR MODELS IN BUSINESS, SCIENCE, AND ENGINEERING 

The mathematical models in this section are all linear ，that is, each describes a problem 
by means of a linear equation, usually in vector or matrix form. The first model concerns 
nutrition but actually is representative of a general technique in linear programming 
problems. The second model comes from electrical engineering. The third model 
introduces the concept of a linear difference equation, a powerful mathematical tool for 
studying dynamic processes in a wide variety of fields such as engineering, ecology, 
economics, telecommunications, and the management sciences. Linear models are 
important because natural phenomena are often linear or nearly linear when the variables 
involved are held within reasonable bounds. Also, linear models are more easily adapted 
for computer calculation than are complex nonlinear models. 

As you read about each model, pay attention to how its linearity reflects some 
property of the system being modeled. 

Constructing a Nutritious Weight-Loss Diet 

WEB The formula for the Cambridge Diet, a popular diet in the 1980s, was based on years 

- of research. A team of scientists headed by Dr. Alan H. Howard developed this 

diet at Cambridge University after more than eight years of clinical work with obese 
patients. 1 The very low-calorie powdered formula diet combines a precise balance 
of carbohydrate, high-quality protein, and fat, together with vitamins, minerals, trace 
elements, and electrolytes. Millions of persons have used the diet to achieve rapid and 
substantial weight loss. 

To achieve the desired amounts and proportions of nutrients, Dr. Howard had to 
incorporate a large variety of foodstuffs in the diet. Each foodstuff supplied several of 
the required ingredients, but not in the correct proportions. For instance, nonfat milk 
was a major source of protein but contained too much calcium. So soy flour was used for 
part of the protein because soy flour contains little calcium. However, soy flour contains 
proportionally too much fat, so whey was added since it supplies less fat in relation to 
calcium. Unfortunately, whey contains too much carbohydrate_ 

The following example illustrates the problem on a small scale. Listed in Table 1 
are three of the ingredients in the diet, together with the amounts of certain nutrients 
supplied by 100 grams (g) of each ingredient. 2 


EXAMPLE 1 If possible, find some combination of nonfat milk, soy flour, and whey 
to provide the exact amounts of protein, carbohydrate, and fat supplied by the diet in 
one day (Table 1). 


'The first announcement of this rapid weight-loss regimen was given in the International Journal of Obesity 
(1978) 2, 321-332. 

2 Ingredients in the diet as of 1984; nutrient data for ingredients adapted from USDA Agricultural 
Handbooks No. 8-1 and 8-6, 1976. 
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TABLE 1 


Amounts (g) Supplied per 100 g of Ingredient 


Amounts (g) Supplied by 
Cambridge Diet in One Day 

Nutrient 

Nonfat milk 

Soy flour 

Whey 

Protein 

36 

51 

13 

33 

Carbohydrate 

52 

34 

74 

45 

Fat 

0 

7 

1.1 

3 


SOLUTION Let x u X 2 , and X 3 , respectively, denote the number of units (100 g) of 
these foodstuffs. One approach to the problem is to derive equations for each nutrient 
separately. For instance, the product 

x\ units of ) (protein per unit 

nonfat milk) ( of nonfat milk 

gives the amount of protein supplied by X\ units of nonfat milk. To this amount, we 
would then add similar products for soy flour and whey and set the resulting sum equal 
to the amount of protein we need. Analogous calculations would have to be made for 
each nutrient. 

A more efficient method, and one that is conceptually simpler, is to consider a 
“nutrient vector” for each foodstuff and build just one vector equation. The amount of 
nutrients supplied by x\ units of nonfat milk is the scalar multiple 

Scalar Vector 

X\ units of I j nutrients per unit) _ 

nonfat milk j ( of nonfat milk j — Xl&1 ( 1 ) 

where ai is the first column in Table 1. Let a 2 and a 3 be the corresponding vectors 
for soy flour and whey, respectively, and let b be the vector that lists the total nutrients 
required (the last column of the table). Then and ^333 give the nutrients supplied 
by X 2 units of soy flour and X 3 units of whey, respectively. So the relevant equation is 

义 iai + x 2 a 2 + x 3 a 3 = b (2) 

Row reduction of the augmented matrix for the corresponding system of equations 
shows that 


"36 

51 

13 

33" 


"1 

0 

0 

.277" 

52 

34 

74 

45 

〜..•〜 

0 

1 

0 

.392 

0 

7 

1.1 

3 


0 

0 

1 

.233 


To three significant digits, the diet requires .277 units of nonfat milk, .392 units of 
soy flour, and .233 units of whey in order to provide the desired amounts of protein, 
carbohydrate, and fat. ■ 

It is important that the values of x\, X 2 , and X 3 found above are nonnegative. This is 
necessary for the solution to be physically feasible. (How could you use —.233 units of 
whey, for instance?) With a large number of nutrient requirements, it may be necessary 
to use a larger number of foodstuffs in order to produce a system of equations with a 
‘‘nonnegative’’ solution. Thus many, many different combinations of foodstuffs may 
need to be examined in order to find a system of equations with such a solution. In 
fact, the manufacturer of the Cambridge Diet was able to supply 31 nutrients in precise 
amounts using only 33 ingredients. 

The diet construction problem leads to the linear equation (2) because the amount 
of nutrients supplied by each foodstuff can be written as a scalar multiple of a vector, as 
in (1). That is, the nutrients supplied by a foodstuff are proportional to the amount of 
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the foodstuff added to the diet mixture. Also, each nutrient in the mixture is the sum of 
the amounts from the various foodstuffs. 

Problems of formulating specialized diets for humans and livestock occur fre¬ 
quently. Usually they are treated by linear programming techniques. Our method of 
constructing vector equations often simplifies the task of formulating such problems. 

Linear Equations and Electrical Networks 

Current flow in a simple electrical network can be described by a system of linear 
equations. A voltage source such as a battery forces a current of electrons to flow 
through the network. When the current passes through a resistor (such as a lightbulb or 
motor), some of the voltage is “used up ”； by Ohm’s law, this “voltage drop” across a 
resistor is given by 

V = RI 

where the voltage V is measured in volts, the resistance R in ohms (denoted by ^), and 
the current flow I in amperes (amps, for short). 

The network in Fig. 1 contains three closed loops. The currents flowing in loops 1, 
2, and 3 are denoted by I\, I 2 , and I 3 , respectively. The designated directions of such 
loop currents are arbitrary. If a current turns out to be negative, then the actual direction 
of current flow is opposite to that chosen in the figure. If the current direction shown is 
away from the positive (longer) side of a battery (-||-) around to the negative (shorter) 
side, the voltage is positive; otherwise, the voltage is negative. 

Current flow in a loop is governed by the following rule. 


KIRCHHOFF'S VOLTAGE LAW 

The algebraic sum of the RI voltage drops in one direction around a loop equals 
the algebraic sum of the voltage sources in the same direction around the loop. 



FIGURE 1 


EXAMPLE 2 Determine the loop currents in the network in Fig. 1. 

SOLUTION For loop 1, the current I\ flows through three resistors, and the sum of the 
RI voltage drops is 

4/i + 4/i + 3/j = (4 + 4 + 3)/! = ll/i 

Current from loop 2 also flows in part of loop 1 ， through the short branch between A 
and B. The associated RI drop there is 3/2 volts. However, the current direction for 
the branch AB in loop 1 is opposite to that chosen for the flow in loop 2, so the algebraic 
sum of all RI drops for loop 1 is 1 l/i — 3 / 2 . Since the voltage in loop 1 is +30 volts, 
Kirchhoff’s voltage law implies that 

11/j -3/ 2 = 30 

The equation for loop 2 is 

— 37"i + 6/2 _ 1^ = 5 

The term —3I\ comes from the flow of the loop-1 current through the branch AB (with 
a negative voltage drop because the current flow there is opposite to the flow in loop 2 ). 
The term 6/2 is the sum of all resistances in loop 2, multiplied by the loop current. The 
term —I 3 = — 1 • / 3 comes from the loop-3 current flowing through the 1-ohm resistor 
in branch CD, in the direction opposite to the flow in loop 2. The loop-3 equation is 


-I 2 + 3 / 3 = -25 
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Note that the 5-volt battery in branch CD is counted as part of both loop 2 and loop 3, 
but it is —5 volts for loop 3 because of the direction chosen for the current in loop 3. 
The 20-volt battery is negative for the same reason. 

The loop currents are found by solving the system 


ll/i - 3 / 2 = 30 

-3/i + 6 I 2 - h= 5 (3) 

- h + 3 / 3 = -25 

Row operations on the augmented matrix lead to the solution: /i = 3 amps, 1 2 = 
1 amp, and I 3 = —8 amps. The negative value of I 3 indicates that the actual current 

in loop 3 flows in the direction opposite to that shown in Fig. 1. ■ 

It is instructive to look at system (3) as a vector equation: 


11 

-3 

+ h 

-3 

6 

+ h 

0 

-1 

_ 

30 

5 

⑷ 

0 


-1 


3 


-25 



ri r 2 r 3 v 

The first entry of each vector concerns the first loop, and similarly for the second and 
third entries. The first resistor vector ri lists the resistance in the various loops through 
which current I\ flows. A resistance is written negatively when I\ flows against the 
flow direction in another loop. Examine Fig. 1 and see how to compute the entries in 
ri ； then do the same for r〗and r〗.The matrix form of equation (4), 


h 

Ri = y, where /? = [ri r 〗 1*3 ] and i = I 2 

_h _ 

provides a matrix version of Ohm’s law. If all loop currents are chosen in the same 
direction (say, counterclockwise), then all entries off the main diagonal of R will be 
negative. 

The matrix equation Ri = \ makes the linearity of this model easy to see at a glance. 
For instance, if the voltage vector is doubled, then the current vector must double. Also, 
a superposition principle holds. That is, the solution of equation (4) is the sum of the 
solutions of the equations 


"30" 


" 0 " 


" 0 " 

0 

, Ri = 

5 

, and Ri = 

0 

0 


0 


-25 


Each equation here corresponds to the circuit with only one voltage source (the other 
sources being replaced by wires that close each loop). The model for current flow is 
linear precisely because Ohm’s law and Kirchhoff’s law are linear: The voltage drop 
across a resistor is proportional to the current flowing through it (Ohm), and the sum of 
the voltage drops in a loop equals the sum of the voltage sources in the loop (Kirchhoff). 

Loop currents in a network can be used to determine the current in any branch of 
the network. If only one loop current passes through a branch, such as from B to D 
in Fig. 1, the branch current equals the loop current. If more than one loop current 
passes through a branch, such as from A to B, the branch current is the algebraic sum 
of the loop currents in the branch (Kirchhoff’s current law). For instance, the current in 
branch AB is /1 — /2 = 3 — 1 = 2 amps, in the direction of I\ . The current in branch 
CD is /2 — /3 = 9 amps. 

















84 CHAPTER 1 Linear Equations in Linear Algebra 


Difference Equations 

In many fields such as ecology, economics, and engineering, a need arises to model 
mathematically a dynamic system that changes over time. Several features of the system 
are each measured at discrete time intervals, producing a sequence of vectors xo, xi, 
X 2 ,.... The entries in provide information about the state of the system at the time 
of the k\h measurement. 

If there is a matrix A such that Xi = Axq, X 2 = Ax\, and, in general, 

Xk-\-\ = Ax/c for /: = 0,1,2,... (5) 


then (5) is called a linear difference equation (or recurrence relation). Given such 
an equation, one can compute xi, X 2 , and so on, provided xo is known. Sections 4.8 
and 4.9, and several sections in Chapter 5, will develop formulas for and describe 
what can happen to as k increases indefinitely. The discussion below illustrates how 
a difference equation might arise. 

A subject of interest to demographers is the movement of populations or groups of 
people from one region to another. The simple model here considers the changes in the 
population of a certain city and its surrounding suburbs over a period of years. 

Fix an initial year—say, 2000—and denote the populations of the city and suburbs 
that year by ro and ^ 0 , respectively. Let xo be the population vector 

City population, 2000 
Suburban population, 2000 


xo 


^0 


For 2001 and subsequent years, denote the populations of the city and suburbs by the 
vectors 


xi = 


厂 1 

A 


X2 = 


ri 

si 


X 3 = 


r3 

S3 


Our goal is to describe mathematically how these vectors might be related. 

Suppose demographic studies show that each year about 5% of the city’s population 
moves to the suburbs (and 95% remains in the city), while 3% of the suburban population 
moves to the city (and 97% remains in the suburbs). See Fig. 2. 




FIGURE 2 Annual percentage migration between city and suburbs. 


After 1 year, the original ro persons in the city are now distributed between city and 


suburbs as 


•95r。_ .95 

.05r o = r ° .05 


Remain in city 
Move to suburbs 


The ^0 persons in the suburbs in 2000 are distributed 1 year later as 


⑹ 


.03 

十 97 


Move to city 
Remain in suburbs 


( 7 ) 




























1.10 Linear Models in Business, Science, and Engineering 85 


The vectors in (6) and (7) account for all of the population in 2001. 3 Thus 


r\ 


•95 


.03 


.95 

.03] 

厂 0 

S\ 

=^0 

.05 

+ ^0 

•97 


•05 

.97 _ 

_ 勿 _ 


That is, 

xi = Mxo 

where M is the migration matrix determined by the following table: 


⑻ 


From: 

City Suburbs To: 

.95 .03 City 

.05 .97 Suburbs 

Equation (8) describes how the population changes from 2000 to 2001. If the migration 
percentages remain constant, then the change from 2001 to 2002 is given by 

x 2 = Mxi 


and similarly for 2002 to 2003 and subsequent years. In general, 

x^+i = Mxk fork = 0,1,2,... (9) 

The sequence of vectors {xq,Xi,X 2 , ...} describes the population of the city/suburban 
region over a period of years. 


EXAMPLE 3 Compute the population of the region just described for the years 
2001 and 2002, given that the population in 2000 was 600,000 in the city and 400,000 
in the suburbs. 


SOLUTION The initial population in 2000 is xq = 


600,000 

400,000 


.For 2001 ， 


For 2002, 


xi = 


".95 

.03" 

"600,000" 


"582,000" 

•05 

.97 _ 

400,000 


418,000 _ 


x 2 = Mx\ = 


".95 

.03" 

"582,000" 


"565,440" 

.05 

•97 _ 

_ 418,000 _ 


_ 434,560 _ 


■ 


The model for population movement in (9) is linear because the correspondence 
Xk \-^ X &+1 is a linear transformation. The linearity depends on two facts: the number 
of people who chose to move from one area to another is proportional to the number of 
people in that area, as shown in (6) and (7)，and the cumulative effect of these choices 
is found by adding the movement of people from the different areas. 


PRACTICE PROBLEM 

Find a matrix A and vectors x and b such that the problem in Example 1 amounts to 
solving the equation Ax = b. 


3 For simplicity, we ignore other influences on the population such as births, deaths, and migration into and 
out of the city/suburban region. 
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1.10 EXERCISES 


1. The container of a breakfast cereal usually lists the number 
of calories and the amounts of protein, carbohydrate, and 
fat contained in one serving of the cereal. The amounts for 
two common cereals are given below. Suppose a mixture of 
these two cereals is to be prepared that contains exactly 295 
calories, 9 g of protein, 48 g of carbohydrate, and 8 g of fat. 

a. Set up a vector equation for this problem. Include a state¬ 
ment of what the variables in your equation represent. 

b. Write an equivalent matrix equation, and then determine 
if the desired mixture of the two cereals can be prepared. 


Nutrition Information per Serving 


Nutrient 

General Mills 
Cheerios® 

Quaker® 

100% Natural Cereal 

Calories 

110 

130 

Protein (g) 

4 

3 

Carbohydrate (g) 

20 

18 

Fat (g) 

2 

5 


classical Mac and Cheese to Annie’s® Whole Wheat 
Shells and White Cheddar. What proportions of servings 
of each food should she use to meet the same goals as in 
part (a)? 

4. The Cambridge Diet supplies .8 g of calcium per day, in 
addition to the nutrients listed in the Table 1 for Example 
1. The amounts of calcium per unit (100 g) supplied by the 
three ingredients in the Cambridge Diet are as follows: 1.26 g 
from nonfat milk, .19 g from soy flour, and .8 g from whey. 
Another ingredient in the diet mixture is isolated soy protein, 
which provides the following nutrients in each unit: 80 g of 
protein, 0 g of carbohydrate, 3.4 g of fat, and • 18 g of calcium. 

a. Set up a matrix equation whose solution determines the 
amounts of nonfat milk, soy flour, whey, and isolated 
soy protein necessary to supply the precise amounts of 
protein, carbohydrate, fat, and calcium in the Cambridge 
Diet. State what the variables in the equation represent. 

b. [M] Solve the equation in (a) and discuss your answer. 


2. One serving of Shredded Wheat supplies 160 calories, 5 g of 
protein, 6 g of fiber, and 1 g of fat. One serving of Crispix® 
supplies 110 calories, 2 g of protein, .1 g of fiber, and .4 g of 
fat. 

a. Set up a matrix B and a vector u such that Bu gives the 
amounts of calories, protein, fiber, and fat contained in 
a mixture of three servings of Shredded Wheat and two 
servings of Crispix. 

b. [M] Suppose that you want a cereal with more fiber than 
Crispix but fewer calories than Shredded Wheat. Is it 
possible for a mixture of the two cereals to supply 130 
calories, 3.20 g of protein, 2.46 g of fiber, and .64 g of 
fat? If so, what is the mixture? 

3. After taking a nutrition class, a big Annie’s® Mac and Cheese 
fan decides to improve the levels of protein and fiber in 
her favorite lunch by adding broccoli and canned chicken. 
The nutritional information for the foods referred to in this 
exercise are given in the table below. 


Nutrition Information per Serving 


Nutrient Mac and Cheese Broccoli Chicken Shells 


Calories 

270 

51 

70 

260 

Protein (g) 

10 

5.4 

15 

9 

Fiber (g) 

2 

5.2 

0 

5 


a. [M] If she wants to limit her lunch to 400 calories but 
get 30 g of protein and 10 g of fiber, what proportions of 
servings of Mac and Cheese, broccoli, and chicken should 
she use? 

b. [M] She found that there was too much broccoli in the 
proportions from part (a), so she decided to switch from 


In Exercises 5-8, write a matrix equation that determines the loop 
currents. [M] If MATLAB or another matrix program is available, 
solve the system for the loop currents. 



2Q. 


4Q. 


1 Q. 


2Q. 
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8. 50 V 40 V 



9. In a certain region, about 7% of a city’s population moves 
to the surrounding suburbs each year, and about 5% of the 
suburban population moves into the city. In 2010, there were 
800,000 residents in the city and 500,000 in the suburbs. 
Set up a difference equation that describes this situation, 
where Xo is the initial population in 2010. Then estimate 
the populations in the city and in the suburbs two years 
later, in 2012. (Ignore other factors that might influence the 
population sizes.) 

10. In a certain region, about 6 % of a city’s population moves 
to the surrounding suburbs each year, and about 4% of the 
suburban population moves into the city. In 2010, there were 
10 , 000,000 residents in the city and 800,000 in the suburbs. 
Set up a difference equation that describes this situation, 
where Xo is the initial population in 2010. Then estimate the 
populations in the city and in the suburbs two years later, in 
2012 . 

11. In 1994, the population of California was 31,524,000, and 
the population living in the United States but outside Cali¬ 
fornia was 228,680,000. During the year, it is estimated that 
516,100 persons moved from California to elsewhere in the 
United States, while 381,262 persons moved to California 
from elsewhere in the United States . 4 

a. Set up the migration matrix for this situation, using five 
decimal places for the migration rates into and out of 
California. Let your work show how you produced the 
migration matrix. 

b. [M] Compute the projected populations in the year 2000 
for California and elsewhere in the United States, assum¬ 
ing that the migration rates did not change during the 6 - 
year period. (These calculations do not take into account 
births, deaths, or the substantial migration of persons into 
California and elsewhere in the United States from other 
countries.) 


4 Migration data supplied by the Demographic Research Unit of the 
California State Department of Finance. 


12. [M] Budget® Rent A Car in Wichita, Kansas has a fleet of 
about 500 cars, at three locations. A car rented at one location 
may be returned to any of the three locations. The various 
fractions of cars returned to the three locations are shown in 
the matrix below. Suppose that on Monday there are 295 cars 
at the airport (or rented from there), 55 cars at the east side 
office, and 150 cars at the west side office. What will be the 
approximate distribution of cars on Wednesday? 


Cars Rented From: 


Airport 

East 

West 

Returned To: 

".97 

.05 

. 10 " 

Airport 

.00 

.90 

.05 

East 

.03 

.05 

.85 

West 


13. [M] Let M and Xq be as in Example 3. 

a. Compute the population vectors for k = 1,..., 20. 
Discuss what you find. 

b. Repeat part (a) with an initial population of 350,000 in 
the city and 650,000 in the suburbs. What do you find? 

14. [M] Study how changes in boundary temperatures on a steel 

plate affect the temperatures at interior points on the plate. 

a. Begin by estimating the temperatures 7^, T 3 , T 4 at 
each of the sets of four points on the steel plate shown in 
the figure. In each case, the value of Tk is approximated 
by the average of the temperatures at the four closest 
points. See Exercises 33 and 34 in Section 1.1, where 
the values (in degrees) turn out to be (20,27.5, 30,22.5). 
How is this list of values related to your results for the 
points in set (a) and set (b)? 

b. Without making any computations, guess the interior 
temperatures in (a) when the boundary temperatures are 
all multipled by 3. Check your guess. 

c. Finally, make a general conjecture about the correspon¬ 
dence from the list of eight boundary temperatures to the 
list of four interior temperatures. 


Plate A 


Plate B 


20 ° 20 ° 



1 

2 


4 

3 





20 ° 20 ° 


0 ° 0 ° 



1 

2 


4 

3 





40( 

40 ( 


10 ° 10 ° 


⑻ 


(b) 
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SOLUTION TO PRACTICE PROBLEM 



36 

51 

13 




33 

A = 

52 

34 

74 

,x = 

X2 

,b = 

45 


0 

7 

1.1 


X 3 


3 


CHAPTER 1 SUPPLEMENTARY EXERCISES 


1. Mark each statement True or False. Justify each answer. (If 

true, cite appropriate facts or theorems. If false, explain why 

or give a counterexample that shows why the statement is not 

true in every case. 

a. Every matrix is row equivalent to a unique matrix in 
echelon form. 

b. Any system of n linear equations in n variables has at 
most n solutions. 

c. If a system of linear equations has two different solu¬ 
tions, it must have infinitely many solutions. 

d. If a system of linear equations has no free variables, then 
it has a unique solution. 

e. If an augmented matrix [ A b ] is transformed into 
[C d ] by elementary row operations, then the equa¬ 
tions Ax = b and Cx = d have exactly the same solu¬ 
tion sets. 

f. If a system Ax = b has more than one solution, then so 
does the system Ax = 0. 

g. If A is an tn x n matrix and the equation Ax = b is 
consistent for some b, then the columns of A span 

h. If an augmented matrix [ A b ] can be transformed by 
elementary row operations into reduced echelon form, 
then the equation Ax = b is consistent. 

i. If matrices A and B are row equivalent, they have the 
same reduced echelon form. 

j. The equation Ax = 0 has the trivial solution if and only 
if there are no free variables. 

k. If ^4 is an m x n matrix and the equation Ax = b is con¬ 
sistent for every b in R m , then A has m pivot columns. 

l. If an m x n matrix A has a pivot position in every row, 
then the equation >lx = b has a unique solution for each 
bin R m . 

m. If an n x /2 matrix A has n pivot positions, then the 
reduced echelon form of A is the n x n identity matrix. 

n. If 3 x 3 matrices A and B each have three pivot posi¬ 
tions, then A can be transformed into B by elementary 
row operations. 


o. If ^4 is an m x « matrix, if the equation Ax = b has at 
least two different solutions, and if the equation Ax = c 
is consistent, then the equation Ax = c has many solu¬ 
tions. 

p. If A and B are row equivalent m x n matrices and if the 
columns of A span R m , then so do the columns of B. 

q. If none of the vectors in the set S = {vi, V 2 , V 3 } in R 3 is 
a multiple of one of the other vectors, then S is linearly 
independent. 

r. If {u, v, w} is linearly independent, then u, y, and w are 
not in E 2 . 

s. In some cases, it is possible for four vectors to span R 5 . 

t. If u and y are in then —u is in Span{u, y}. 

u. If u, y, and w are nonzero vectors in M 2 , then w is a linear 
combination of u and y. 

v. If w is a linear combination of u and v in R”，then u is a 
linear combination of v and w. 

w. Suppose that Vi, V 2 , and V 3 are in 1R. 5 , \2 is not a multiple 
of Vi, and V 3 is not a linear combination of Vi and V 2 . 
Then {vi, V 2 , V 3 } is linearly independent. 

x. A linear transformation is a function. 

y. If ^4 is a 6 x 5 matrix, the linear transformation x i-^- Ax 
cannot map R 5 onto M 6 . 

z. If yl is an m x « matrix with m pivot columns, then the 
linear transformation x 1 -^ Ax is a one-to-one mapping. 

2. Let a and b represent real numbers. Describe the possible 
solution sets of the (linear) equation ax = b. [Hint: The 
number of solutions depends upon a and b.] 

3. The solutions (x, y, z) of a single linear equation 
ax by cz = d 

forma plane in R 3 when < 3 , b, and c are not all zero. Construct 
sets of three linear equations whose graphs (a) intersect in 
a single line, (b) intersect in a single point, and (c) have no 
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points in common. Typical graphs are illustrated in the figure. 



Three planes intersecting 
in a line 

(a) 

矣 

Three planes with no 
intersection 

(c) (C) 


Three planes intersecting 
in a point 

(b) 



Three planes with no 
intersection 


c. Define an appropriate linear transformation T using the 
matrix in (b), and restate the problem in terms of T. 


8 . Describe the possible echelon forms of the matrix A. Use the 
notation of Example 1 in Section 1.2. 

a. yl is a 2 x 3 matrix whose columns span R 2 . 

b. yl is a 3 x 3 matrix whose columns span R 3 . 

9. Write the vector ^ as the sum of two vectors, 

6 

one on the line {(x, y) ： y = 2x} and one on the line 
{(x,y) : y = x/2}. 

10. Let ai, a 2 , and b be the vectors in R 2 shown in the figure, and 
let A = [ai a 2 ]. Does the equation Ax = b have a solution? 
If so, is the solution unique? Explain. 



4. Suppose the coefficient matrix of a linear system of three 
equations in three variables has a pivot position in each 
column. Explain why the system has a unique solution. 

5. Determine h and k such that the solution set of the system 
(i) is empty, (ii) contains a unique solution, and (iii) contains 
infinitely many solutions. 

a. x\ + 3^2 — k b. —2x\ + hxi = 1 

4xi + hx2 = 8 6xi + kx2 = —2 

6 . Consider the problem of determining whether the following 
system of equations is consistent: 

4xi — 2 又 2 + 7x3 = 一 5 

8^i _ 3x2 + 10^3 — _ 3 

a. Define appropriate vectors, and restate the problem in 
terms of linear combinations. Then solve that problem. 

b. Define an appropriate matrix, and restate the problem 
using the phrase “columns of A.^ 

c. Define an appropriate linear transformation T using the 
matrix in (b), and restate the problem in terms of T. 

7. Consider the problem of determining whether the following 
system of equations is consistent for all bi, Z? 2 , by. 

2x\ — 4x2 — 2^3 = b\ 

— 5 义 1 + 义2 + ^3 — 1)2 

1 X\ _ 5^2 — 3^3 — Z?3 

a. Define appropriate vectors, and restate the problem in 
terms of Span {vi,V2, V3}. Then solve that problem. 

b. Define an appropriate matrix, and restate the problem 
using the phrase “columns of A'' 


11. Construct a 2 x 3 matrix A, not in echelon form, such that 
the solution of Ax = 0 is a line in M. 3 . 

12. Construct a 2 x 3 matrix A, not in echelon form, such that 
the solution of Ax = 0 is a plane in R 3 . 

13. Write the reduced echelon form of a 3 x 3 matrix A such 
that the first two columns of A are pivot columns and 


3" 


" 0 " 

-2 

= 

0 

1 


0 


14. Determine the value(s) of a such that 
linearly independent. 

15. In (a) and (b), suppose the vectors are linearly independent. 
What can you say about the numbers a,... ， /? Justify your 
answers. [Hint: Use a theorem for (b).] 


16. Use Theorem 7 in Section 1.7 to explain why the columns of 
the matrix A are linearly independent. 

" 1 0 0 0 " 

, 2 5 0 0 

^=3680 
4 7 9 10_ 

17. Explain why a set {vi,V 2 ,¥ 3 ,V 4 } in R 5 must be linearly 
independent when {vi, V 2 , V 3 } is linearly independent and V 4 
is not in Span {vi, y 2 , v 3 }. 

1$. Suppose {vi, V 2 } is a linearly independent set in Show 
that {vi,Yi + V 2 } is also linearly independent. 


a 

a + 2 


a 


b 


d 

1 


c 


e 

0 

9 

1 

9 

f 

0 


0 


1 


a 


b 


d 

0 

, 

c 

, 

e 

0 


0 


f 
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19. Suppose Vi, V 2 , V 3 are distinct points on one line in R 3 . The 
line need not pass through the origin. Show that {vi, V 2 , V 3 } 
is linearly dependent. 

20. Let T :M, n ^ be a linear transformation, and suppose 
T (u) = v. Show that T (—u) = —v. 

21. Let 7" : R 3 — R 3 be the linear transformation that re¬ 
flects each vector through the plane X 2 = 0. That is, 
T (xi, X 2 , X 3 ) = (xi, —X 2 , X 3 ). Find the standard matrix of T. 

22. Let ^4 be a 3 x 3 matrix with the property that the linear 
transformation x Ax maps R 3 onto R 3 . Explain why the 
transformation must be one-to-one. 


23. A Givens rotation is a linear transformation from JR 71 to R n 


used in computer programs to create a zero entry in a vector 
(usually a column of a matrix). The standard matrix of a 
Givens rotation in R 2 has the form 


a —b 
b a 


a 1 -\-b 2 = \ 


Find a and b such that 



is rotated into 


5 

0 


x i 



A Givens rotation in R 2 . 


24. The following equation describes a Givens rotation in M 3 . 
Find a and b. 


a 

0 

—b 

" 2 " 



0 

1 

0 

3 

= 

3 

_b 

0 

a 

4 


0 


25. A large apartment building is to be built using modular 
construction techniques. The arrangement of apartments 
on any particular floor is to be chosen from one of three 
basic floor plans. Plan A has 18 apartments on one floor, 
including 3 three-bedroom units, 7 two-bedroom units, and 8 
one-bedroom units. Each floor of plan B includes 4 three- 
bedroom units, 4 two-bedroom units, and 8 one-bedroom 
units. Each floor of plan C includes 5 three-bedroom units, 
3 two-bedroom units, and 9 one-bedroom units. Suppose the 
building contains a total of X\ floors of plan A, floors of 
plan B, and floors of plan C. 

"3" 

a. What interpretation can be given to the vector x\ 7 ? 

_ 8 _ 

b. Write a formal linear combination of vectors that ex¬ 
presses the total numbers of three-, two-, and one- 
bedroom apartments contained in the building. 

c. [M] Is it possible to design the building with exactly 66 
three-bedroom units, 74 two-bedroom units, and 136 one- 
bedroom units? If so, is there more than one way to do 
it? Explain your answer. 


WEB 
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Matrix Algebra 


INTRODUCTORY EXAMPLE 


Computer Models in Aircraft Design 



To design the next generation of commercial and military 
aircraft, engineers at Boeing’s Phantom Works use 3D 
modeling and computational fluid dynamics (CFD). They 
study the airflow around a virtual airplane to answer 
important design questions before physical models are 
created. This has drastically reduced design cycle times 
and cost—and linear algebra plays a crucial role in the 
process. 

The virtual airplane begins as a mathematical “wire¬ 
frame” model that exists only in computer memory and 
on graphics display terminals. (A model of a Boeing 
777 is shown.) This mathematical model organizes and 
influences each step of the design and manufacture of the 
airplane—both the exterior and interior. The CFD analysis 
concerns the exterior surface. 

Although the finished skin of a plane may seem 
smooth, the geometry of the surface is complicated. In 
addition to wings and a fuselage, an aircraft has nacelles, 
stabilizers, slats, flaps, and ailerons. The way air flows 
around these structures determines how the plane moves 
through the sky. Equations that describe the airflow are 
complicated, and they must account for engine intake, 
engine exhaust, and the wakes left by the wings of the 
plane. To study the airflow, engineers need a highly refined 
description of the plane’s surface. 

A computer creates a model of the surface by first 
superimposing a three-dimensional grid of “boxes” on the 


original wire-frame model. Boxes in this grid lie either 
completely inside or completely outside the plane, or they 
intersect the surface of the plane. The computer selects 
the boxes that intersect the surface and subdivides them, 
retaining only the smaller boxes that still intersect the 
surface. The subdividing process is repeated until the grid 
is extremely fine. A typical grid can include over 400,000 
boxes. 

The process for finding the airflow around the plane 
involves repeatedly solving a system of linear equations 
Ax = b that may involve up to 2 million equations and 
variables. The vector b changes each time, based on data 
from the grid and solutions of previous equations. Using 
the fastest computers available commercially, a Phantom 
Works team can spend from a few hours to several days 
setting up and solving a single airflow problem. After the 
team analyzes the solution, they may make small changes 
to the airplane surface and begin the whole process again. 
Thousands of CFD runs may be required. 

This chapter presents two important concepts that 
assist in the solution of such massive systems of equations: 

• Partitioned matrices: A typical CFD system 
of equations has a “sparse” coefficient matrix 
with mostly zero entries. Grouping the variables 
correctly leads to a partitioned matrix with many 
zero blocks. Section 2.4 introduces such matrices 
and describes some of their applications. 
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• Matrix factorizations: Even when written with 
partitioned matrices，the system of equations is 
complicated. To further simplify the computations, 
the CFD software at Boeing uses what is called 
an LU factorization of the coefficient matrix. 
Section 2.5 discusses LU and other useful matrix 
factorizations. Further details about factorizations 
appear at several points later in the text. 

To analyze a solution of an airflow system, engineers 
want to visualize the airflow over the surface of the plane. 
They use computer graphics, and linear algebra provides 
the engine for the graphics. The wire-frame model of the 
plane’s surface is stored as data in many matrices. Once the 
image has been rendered on a computer screen, engineers 
can change its scale, zoom in or out of small regions, and 
rotate the image to see parts that may be hidden from view. 
Each of these operations is accomplished by appropriate 



Modem CFD has revolutionized wing design. The Boeing 
Blended Wing Body is in design for the year 2020 or sooner. 


matrix multiplications. Section 2.7 explains the basic 
ideas. 

WEB 


Our ability to analyze and solve equations will be greatly enhanced when we can perform 
algebraic operations with matrices. Furthermore, the definitions and theorems in this 
chapter provide some basic tools for handling the many applications of linear algebra 
that involve two or more matrices. For square matrices, the Invertible Matrix Theorem 
in Section 2.3 ties together most of the concepts treated earlier in the text. Sections 2.4 
and 2.5 examine partitioned matrices and matrix factorizations, which appear in most 
modern uses of linear algebra. Sections 2.6 and 2.7 describe two interesting applications 
of matrix algebra, to economics and to computer graphics. 


2.1 MATRIX OPERATIONS 

If ^4 is an m x matrix—that is, a matrix with m rows and n columns—then the scalar 
entry in the ith row and jth column of A is denoted by ay and is called the (/, y)-entry 
of A. See Fig. 1. For instance, the (3,2)-entry is the number in the third row, second 

column. Each column of ^4 is a list of m real numbers, which identifies a vector in 
Often, these columns are denoted by ai,. • •, a„, and the matrix A is written as 

A = [ai a 2 ■■- a„] 

Observe that the number is the i th entry (from the top) of the y th column vector a y . 

The diagonal entries in an m x n matrix A = [ aij ] are an, ^ 22 , ^ 33 ,..., and they 
form the main diagonal of A. A diagonal matrix is a square n x n matrix whose 
nondiagonal entries are zero. An example is the n x n identity matrix, I n . An m x n 
matrix whose entries are all zero is a zero matrix and is written as 0. The size of a zero 
matrix is usually clear from the context. 
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THEOREM 1 


Column 

j 

an … aij 


Row i an • • • 



FIGURE 1 Matrix notation. 


a \n 

a in = 必 

^mn . 


Sums and Scalar Multiples 

The arithmetic for vectors described earlier has a natural extension to matrices. We 
say that two matrices are equal if they have the same size (i.e.，the same number of 
rows and the same number of columns) and if their corresponding columns are equal, 
which amounts to saying that their corresponding entries are equal. If A and B are 
m x n matrices, then the sum A B is the m x n matrix whose columns are the sums 
of the corresponding columns in A and B. Since vector addition of the columns is done 
entrywise, each entry in A B is the sum of the corresponding entries in A and B. The 
sum A B is defined only when A and B are the same size. 


EXAMPLE 1 Let 


Then 


A = 


4 0 

5' 

D — 

'i i r 

n — 

'2 -3" 

-1 3 

2 

， ^ — 

_3 5 7_ 

， l _ 

0 1_ 


A -\- B = 


5 1 

2 8 


6 

9 


but A + C is not defined because A and C have different sizes. 


■ 


If r is a scalar and ^4 is a matrix, then the scalar multiple rA is the matrix whose 
columns are r times the corresponding columns in A. As with vectors, —A stands for 
(—1)^4, and A — B is the same as ^4 + (—1)5. 


EXAMPLE 2 If ^ and 5 are the matrices in Example 1 ， then 


2B 

A-2B 


4 

-1 


1 

5 

0 

3 


1 

7_ 

5 

2 


2 

10 

2 

10 


2 

14 

2 

14 


2-2 3 

-7 -7 -12 


■ 


It was unnecessary in Example 2 to compute A — 2B as ^4 + because the 

usual rules of algebra apply to sums and scalar multiples of matrices, as the following 
theorem shows. 


Let A, B, and C be matrices of the same size, and let r and s be scalars. 

a. A -\- B = B -\- A d. r(A -\- B) = rA -\- rB 

b. {A-\- B) -\- C = A (B + C) e. (r + = rA + sA 

c. ^4 + 0 = ^4 f. r(5^) = (rs)A 
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Each equality in Theorem 1 is verified by showing that the matrix on the left side has 
the same size as the matrix on the right and that corresponding columns are equal. Size 
is no problem because A ， B ，and C are equal in size. The equality of columns follows 
immediately from analogous properties of vectors. For instance, if the 7 th columns of 
A, B, and C are a ; -, b y , and Cj, respectively, then the 7 th columns of {A -\- B) -\- C 
and A (B C) are 


(a 7 - + by) + Cj and a y - + (b y - + c y ) 

respectively. Since these two vector sums are equal for each j, property (b) is verified. 

Because of the associative property of addition, we can simply write A -\- B -\- C 
for the sum, which can be computed either as (^4 + 5) + C or as ^ + (5 + C). The 
same applies to sums of four or more matrices. 


Matrix Multiplication 

When a matrix B multiplies a vector x, it transforms x into the vector Bx. If this vector 
is then multiplied in turn by a matrix A, the resulting vector is A(Bx). See Fig. 2. 


Multiplication 



Multiplication 


by A 


A(Bx) 


FIGURE 2 Multiplication by B and then A. 

Thus A(Bx) is produced from x by a composition of mappings—the linear transfor¬ 
mations studied in Section 1.8. Our goal is to represent this composite mapping as 
multiplication by a single matrix, denoted by AB, so that 

A(Bx) = (AB)x (1) 


See Fig. 3. 


Multiplication Multiplication 



FIGURE 3 Multiplication by AB. 


If A is m x 5 is « x p, and x is in M 77 , denote the columns of 5 by bi,..., 
and the entries in x by Xi,... , x p . Then 

Bx = xibi H - h x p b p 

By the linearity of multiplication by A, 

A(Bx) = A(x\bi) + ••• + Aixpbp) 

=Xi^4bi + • • • + XpAhp 
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DEFINITION 


The vector A{Bx) is a linear combination of the vectors Ah\, ... ， Ah p , using the entries 
in x as weights. In matrix notation, this linear combination is written as 

A(Bx) = [ Ab\ A \)2 - - - Ab p ]x 

Thus multiplication by [ Ab\ A \)2 ••• Ab p ] transforms x into A(Bx). We have 
found the matrix we sought! 

If A is an m x n matrix, and if 5 is an n x ^ matrix with columns bi, ..., b^, 
then the product AB is the m x p matrix whose columns are Ab\ ， ...， Ab p . That 
is, 

AB = A[b\ \)2 ••- b p ] = [Ab\ Al)2 - - - 


This definition makes equation (1) true for all x in ~K P . Equation (1) proves that the 
composite mapping in Fig. 3 is a linear transformation and that its standard matrix is 
AB. Multiplication of matrices corresponds to composition of linear transformations. 


EXAMPLE 3 Compute AB, where A = "T \ 

1 — 5 

SOLUTION Write B = [b\ \)2 b 〗 ],and compute: 


and B 


4 3 6 

1 -2 3 



"2 

1 - 

3' 

"4' 

1 


"2 

1 

3_ 

_ 3' 


'2 

1 - 

Ab\ = 

-5 - 

, Ah 2 = 

-5_ 

-2 

，火 b 3 = 


"11" 



O' 



'21" 


-1 



13 



-9_ 










Then 


AB = A[b x b 2 b 3 ] 


11 

-1 


0 21 
13 -9 


■ 


I I I 
Abi Ab 2 Ab 3 


Notice that since the first column of AB is Ab\, this column is a linear combination 
of the columns of A using the entries in bi as weights. A similar statement is true for 
each column of AB. 


Each column of AB is a linear combination of the columns of A using weights 
from the corresponding column of B. 


Obviously, the number of columns of A must match the number of rows in B in 
order for a linear combination such as Ab\ to be defined. Also, the definition of AB 
shows that AB has the same number of rows as A and the same number of columns 
as B. 

EXAMPLE 4 If ylis a 3 x 5 matrix and 5 is a 5 x 2 matrix, what are the sizes of 
AB and BA, if they are defined? 
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SOLUTION Since A has 5 columns and B has 5 rows, the product AB is defined and 
is a 3 x 2 matrix: 




A 



B 

AB 

氺 

* 

氺 

氺 

* 


* 

氺 


氺 氺 

氺 

* 

氺 

氺 

氺 


氺 

* 

= 

氺 氺 

* 

氺 

* 

本 

氺 


氺 

本 


氺 * 






氺 

氺 







* 

* 




3x5 



5x2 

3x2 


I Match I 

Size of AB 


The product BA is not defined because the 2 columns of B do not match the 3 rows 
of A. ■ 

The definition of AB is important for theoretical work and applications, but the 
following rule provides a more efficient method for calculating the individual entries in 
AB when working small problems by hand. 


ROW-COLUMN RULE FOR COMPUTING AB 

If the product AB is defined, then the entry in row i and column j of AB is the 
sum of the products of corresponding entries from row i of A and column j of 
B. If (AB)ij denotes the (/, 7 )-entry in AB ， and if 4 is an m x « matrix, then 

(AB)ij = anby + 0 / 2 办 2y + • • • + cii n b n j 


To verify this rule, let 5 = [bi • • • b p ]. Column j of AB is Abj, and we can 
compute Abj by the row-vector rule for computing Ax from Section 1.4. The i th entry 
in Abj is the sum of the products of corresponding entries from row i of A and the 
vector by, which is precisely the computation described in the rule for computing the 
(z, 7 )-entry of AB. 


EXAMPLE 5 Use the row-column rule to compute two of the entries in AB for the 
matrices in Example 3. An inspection of the numbers involved will make it clear how 
the two methods for calculating AB produce the same matrix. 


SOLUTION To find the entry in row 1 and column 3 of AB, consider row 1 of ^4 and 
column 3 of B. Multiply corresponding entries and add the results, as shown below: 


AB = 


"2 3" 

"4 3 6 " 


'□ 

□ 

2(6)+ 3(3) _ 


"□ 

□ 

21 " 

1 -5 - 

1 -2 3 


□ 

□ 

□ 


□ 

□ 

□ 


For the entry in row 2 and column 2 of AB, use row 2 of A and column 2of B: 


"2 3" 

"4 3 6 ' 


'□ 

□ 

21 ' 


'□ 

□ 

21 " 

■ 1 -5_ 

1 -2 3 


□ 

1(3)+ -5(-2) 

□ 


□ 

13 

□ 
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EXAMPLE 6 Find the entries in the second row of AB ，where 



2 

-5 

0" 


"4 -6 一 

A = 

-1 

6 

3 

-8 

-4 

, B = 

7 1 

-7 

3 2 


-3 

0 

9 



SOLUTION By the row-column rule, the entries of the second row of AB come from 
row 2 of A (and the columns of B): 



□ 

□ _ 


"□ 

□" 

-4 + 21-12 

6 + 3-8 


5 

1 

□ 

□ 


□ 

□ 

□ 

□ 


□ 

□ 


Notice that since Example 6 requested only the second row of AB, we could have 
written just the second row of A to the left of B and computed 

"4 -6" 

[-1 3 -4] 7 1 = [5 1] 

[_3 2_ 

This observation about rows of AB is true in general and follows from the row-column 
rule. Let row, (^4) denote the ith row of a matrix A. Then 

row ； (AB) = rowi (A) - B (2) 


Properties of Matrix Multiplication 

The following theorem lists the standard properties of matrix multiplication. Recall that 
I m represents the m x m identity matrix and I m x = x for all x in 


THEOREM 2 


Let ^4 be an m x « matrix, and let B and C have sizes for which the indicated 
sums and products are defined. 


a. A(BC) = (AB)C 

b. A(B + C) = AB -h AC 

c. (B + C)A = BA+ CA 

d. r(AB) = {rA)B = A(rB) 
for any scalar r 

e. I m A = A = AI n 


(associative law of multiplication) 
(left distributive law) 

(right distributive law) 

(identity for matrix multiplication) 


PROOF Properties (b)-(e) are considered in the exercises. Property (a) follows from 
the fact that matrix multiplication corresponds to composition of linear transformations 
(which are functions), and it is known (or easy to check) that the composition of func¬ 
tions is associative. Here is another proof of (a) that rests on the “column definition” of 
the product of two matrices. Let 


(7 = [ Ci • • • Cp ] 
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WEB 


By the definition of matrix multiplication, 

BC = [Bci ... Bc p ] 

A(BC) = [A(Bc l ) A(Bc p )] 

Recall from equation (1) that the definition of AB makes A(Bx) = (AB)x for all x, so 


A(BC) = [ (AB)c x ••• (AB)c p ] = (AB)C 

The associative and distributive laws in Theorems 1 and 2 say essentially that pairs 
of parentheses in matrix expressions can be inserted and deleted in the same way as in 
the algebra of real numbers. In particular, we can write ABC for the product, which 
can be computed either as A(BC) or as (AB)C. 1 Similarly, a product ABCD of four 
matrices can be computed as A(BCD) or (ABC)D or A(BC)D, and so on. It does not 
matter how we group the matrices when computing the product, so long as the left-to- 
right order of the matrices is preserved. 

The left-to-right order in products is critical because AB and BA are usually not 
the same. This is not surprising, because the columns of AB are linear combinations 
of the columns of A, whereas the columns of BA are constructed from the columns of 
B. The position of the factors in the product AB is emphasized by saying that A is 
right-multiplied by B or that B is left-multiplied by A. If AB = BA, we say that A and 
B commute with one another. 


EXAMPLE 7 LaA = 

'5 r 

_3 - 2 _ 

and B = 

"2 0 " 
4 3 _ 

not commute. That is, verify that AB ^ BA. 



Show that these matrices do 


SOLUTION 


AB = 
BA = 


"5 

r 

"2 

0" 


_ 14 

3" 

_3 

-2 

4 

3_ 


-2 

-6_ 

"2 

0" 

"5 

r 


"10 

2" 

4 

3_ 

_3 

-2 


29 

-2 


■ 


Example 7 illustrates the first of the following list of important differences between 
matrix algebra and the ordinary algebra of real numbers. See Exercises 9-12 for exam¬ 
ples of these situations. 


WARNINGS: 

1. In general, AB ^ BA. 

2. The cancellation laws do not hold for matrix multiplication. That is, if 
AB = AC, then it is not true in general that B = C. (See Exercise 10.) 

3. If a product AB is the zero matrix, you cannot conclude in general that either 
^4 = 0 or ^ = 0. (See Exercise 12.) 


Powers of a Matrix 

If ^4 is an n x « matrix and if A: is a positive integer, then A k denotes the product of k 


1 When B is square and C has fewer columns than A has rows, it is more efficient to compute A(BC) than 
(AB)C. 






















copies of A : 
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THEOREM 3 


A k = A---A 



If A is nonzero and if x is in then A k x is the result of left-multiplying x by d 
repeatedly k times. If k = 0, then A°x should be x itself. Thus A 0 is interpreted as the 
identity matrix. Matrix powers are useful in both theory and applications (Sections 2.6, 
4.9, and later in the text). 


The Transpose of a Matrix 

Given an m x n matrix A, the transpose of A is the n x m matrix, denoted by A T , 
whose columns are formed from the corresponding rows of A. 


EXAMPLE 8 Let 


Then 



「 r n 


"-5 

2' 


A = 

CL b 

c d 

, B = 

1 

0 

-3 

4 

, c = 


1 1 1 
5-2 7 




-5 

2 




■ 


Let A and B denote matrices whose sizes are appropriate for the following sums 
and products. 

a. (A t ) t = A 

b. (A + B) t = A t + B r 

c. For any scalar r, (rA) T = rA T 

d. {AB) t = B t A t 


Proofs of (a)-(c) are straightforward and are omitted. For (d), see Exercise 33. 
Usually, {AB) t is not equal to A T B T , even when A and B have sizes such that the 
product A t B t is defined. 

The generalization of Theorem 3(d) to products of more than two factors can be 
stated in words as follows: 


The transpose of a product of matrices equals the product of their transposes in 
the reverse order. 


The exercises contain numerical examples that illustrate properties of transposes. 
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6 . A 


, B 


7. If a matrix 乂 is 5 x 3 and the product AB is 5 x 7, what is 
the size of B? 


$. How many rows does B have if BC is a 5 x 4 matrix? 


9. Let A 


.What value(s) 


In Exercises 1 and 2, compute each matrix sum or product if it is 
defined. If an expression is undefined, explain why. Let 


A : 


C 


2 0-1 
4-5 2 

1 2 
-2 1 


B 


D 


4 


-3 


E 


1. -2A, B-2A, AC, CD 

2. A + 3B, 2C - 3E, DB, EC 


4 -3 
-3 5 

0 1 


In the rest of this exercise set and in those to follow, assume that 
each matrix expression is defined. That is, the sizes of the matrices 
(and vectors) involved “match” appropriately. 


3. Let A 


.Compute 31 2 ~ A and (372)^4. 


4. Compute A — 51s and (5Io,)A, where 

5 - 1 3* 

A= -4 3-6 

-3 1 1 • 

In Exercises 5 and 6, compute the product AB in two ways: (a) by 
the definition, where Ab\ and Ah 2 are computed separately, and 
(b) by the row-column rule for computing AB. 


5. A 


,B 


-2 


of k, if any, will make AB = BA1 


10. Let A 


11. Let A 


3 


and C 


.Verify that AB = AC and yet B ^ C. 


"1 

2 

3" 


"5 

0 

0" 

2 

4 

5 

and D = 

0 

3 

0 

3 

5 

6 


0 

0 

2 


. Com¬ 


pute AD and DA. Explain how the columns or rows of A 
change when A is multiplied by D on the right or on the left. 
Find a 3 x 3 matrix B, not the identity matrix or the zero 
matrix, such that AB = BA. 


12. Let A 


.Construct a 2 x 2 matrix B such that 


3 -6 
-2 4_ 

AB is the zero matrix. Use two different nonzero columns 
for B. 



and B = 


|— NUMERICAL NOTES - 

1. The fastest way to obtain AB on a computer depends on the way in which 
the computer stores matrices in its memory. The standard high-performance 
algorithms, such as in LAPACK, calculate AB by columns, as in our definition 
of the product. (A version ofLAPACK written in C++ calculates AB by rows.) 

2. The definition of AB lends itself well to parallel processing on a computer. 
The columns of B are assigned individually or in groups to different proces¬ 
sors, which independently and hence simultaneously compute the correspond¬ 
ing columns of AB. 


PRACTICE PROBLEMS 


1. Since vectors in R 71 may be regarded as « x 1 matrices, the properties of transposes 
in Theorem 3 apply to vectors, too. Let 

A = _2 _ 4 and x = 3 

Compute (Ax) T , x T A T , xx T , and x T x. Is A T x T defined? 

2. Let ^4 be a 4 x 4 matrix and let x be a vector in R 4 . What is the fastest way to 
compute A 2 xl Count the multiplications. 

2.1 EXERCISES 




















































2.1 Matrix Operations 101 


13. Let ri,... ， r p be vectors in R”，and let Q be an m x « 
matrix. Write the matrix [ Qr\ ... Qr p ] as a product of 
two matrices (neither of which is an identity matrix). 

14. Let U be the 3x2 cost matrix described in Example 6 in 
Section 1.8. The first column of U lists the costs per dollar of 
output for manufacturing product B, and the second column 
lists the costs per dollar of output for product C. (The costs 
are categorized as materials, labor, and overhead.) Let qj 
be a vector in R 2 that lists the output (measured in dollars) 
of products B and C manufactured during the first quarter 
of the year, and let q 2 , q 3 , and q 4 be the analogous vectors 
that list the amounts of products B and C manufactured in 
the second, third, and fourth quarters, respectively. Give an 
economic description of the data in the matrix UQ, where 

2 = [qi q 2 q 3 q 4 l- 

Exercises 15 and 16 concern arbitrary matrices A, B, and C for 
which the indicated sums and products are defined. Mark each 
statement True or False. Justify each answer. 

15. a. If A and B are 2x2 matrices with columns ai, a 2 , and 

bi, b 2 , respectively, then AB = [ aib! a 2 b 2 ]. 

b. Each column of AB is a linear combination of the 
columns of B using weights from the corresponding col¬ 
umn of A. 

c. AB +AC = A(B + C) 

d. A T + B T = (A + B) t 

e. The transpose of a product of matrices equals the product 
of their transposes in the same order. 

16. a. The first row of AB is the first row of A multiplied on the 

right by B . 

b. If ^4 and B are 3x3 matrices and 5 = [bi b 2 b 3 ], 
then AB = [ Abi + Ab 2 + Ab^ ]. 

c. If y4 is an n x « matrix, then (A 2 ) T = {A 7 ) 1 

d. (ABC) t = C t A t B t 

e. The transpose of a sum of matrices equals the sum of their 
transposes. 

17. If A = I and AB = 

— 3 j 

first and second columns of B . 

18. Suppose the third column of B is all zeros. What can be said 
about the third column of ABI 

19. Suppose the third column of B is the sum of the first two 
columns. What can be said about the third column of ABI 
Why? 

20. Suppose the first two columns, bi and b 2 , of B are equal. 
What can be said about the columns of ABI Why? 

21. Suppose the last column of AB is entirely zeros but B itself 
has no column of zeros. What can be said about the columns 
oiAl 


-3 —11 
1 17 


determine the 


22. Show that if the columns of B are linearly dependent, then 
so are the columns of AB. 

23. Suppose CA = I n (the n y. n identity matrix). Show that the 
equation Ax = 0 has only the trivial solution. Explain why 
A cannot have more columns than rows. 

24. Suppose is a 3 x n matrix whose columns span R 3 . Explain 
how to construct an n x 3 matrix D such that AD = I^. 

25. Suppose ^4 is an m x « matrix and there exist n x m matrices 
C and D such that CA = I n and AD = I m . Prove that 
m = n and C = D. [Hint: Think about the product CAD.] 

26. Suppose AD = I m (the m x m identity matrix). Show that 
for any b in R m , the equation Ax = b has a solution. [Hint: 
Think about the equation ADb = b.] Explain why A cannot 
have more rows than columns. 


In Exercises 27 and 28, view vectors in as « x 1 matrices. For 
u and y in V, the matrix product u r y is a 1 x 1 matrix, called the 
scalar product, or inner product, of u and y. It is usually written 
as a single real number without brackets. The matrix product uv r 
is m n x n matrix, called the outer product of u and y. The 
products u r v and uv r will appear later in the text. 


-3 

27. Let u = 2 

— 5 

and vu r . 

28. If u and y are in R n , how are u r v and v r u related? How are 
uv r and vu r related? 


and y = 

a 

b 


c 


.Compute u r v, y r u, uv 7 


29. Prove Theorem 2(b) and 2(c). Use the row-column rule. The 
(/, 7 )-entry in A(B + C) can be written as 

an(bij + ciy) + … + ai n {b n j + c n j) 

or 

n 

〉 ^ ^ik (J^kj + Cfcy) 

k=l 

30. Prove Theorem 2(d). [Hint: The (/ ， 7 ')-entry in {rA)B is 

(ran)bij H - h {ra in )b nj .} 

31. Show that I m A = A where j is an m x n matrix. Assume 
I m x = x for all x in R m . 

32. Show that AI n = A when ^4 is an m x matrix. [Hint: Use 
the (column) definition of AI n .] 

33. Prove Theorem 3(d). [Hint: Consider the yth row of (AB) T .] 

34. Give a formula for (ABx) T , where x is a vector and A and B 
are matrices of appropriate sizes. 

35. [M] Read the documentation for your matrix program, and 
write the commands that will produce the following matrices 
(without keying in each entry of the matrix). 

a. A 4 x 5 matrix of zeros 

b. A 5 x 3 matrix of ones 

c. The 5x5 identity matrix 

d. A 4 x 4 diagonal matrix, with diagonal entries 3,4, 2, 5 
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A useful way to test new ideas in matrix algebra, or to make 
conjectures, is to make calculations with matrices selected at 
random. Checking a property for a few matrices does not prove 
that the property holds in general, but it makes the property more 
believable. Also, if the property is actually false, making a few 
calculations may help to discover this. 

36. [M] Write the command(s) that will create a 5 x 6 matrix 
with random entries. In what range of numbers do the entries 
lie? Tell how to create a 4 x 4 matrix with random integer 
entries between —9 and 9. [Hint: If x is a random number 
such that 0 < x < 1, then —9.5 < \9(x — .5) < 9.5.] 

37. [M] Construct random 4x4 matrices A and B to test 
whether AB = BA. The best way to do this is to compute 
AB — BA and check whether this difference is the zero 
matrix. Then test AB — BA for three more pairs of random 
4x4 matrices. Report your conclusions. 

38. [M] Construct a random 5x5 matrix A and test whether 
(A + 7)(^4 — I) = A 2 — I. The best way to do this is to 
compute (v4 + /)(i4 — /) — (A 2 — I) and verify that this 
difference is the zero matrix. Do this for three random 
matrices. Then test (A + B)(A — B) = A 2 — B 2 the same 


way for three pairs of random 4x4 matrices. Report your 
conclusions. 

39. [M] Use at least three pairs of random 4x4 matrices A 
and B to test the equalities (A + B) T = A 7 + B T and 
(AB) T = B T A T , as well as (AB) T = A T B T . (See Exercise 
37.) Report your conclusions. [Note: Most matrix programs 
use A! for A T .] 

40. [M] Let 


S = 


Compute S k for k = 2,... , 6 . 

41. [M] Describe in words what happens when A 5 , A 10 , A 20 , and 
A 30 are computed for 

A = 


SOLUTIONS TO PRACTICE PROBLEMS 


1. Ax 


"1-3" 

"5" 


"-4" 

-2 4 

_3_ 


2 


.So (Ax) T = [—4 2]. Also, 
[-4 2]. 


1 -2 
-3 4 


x t A t = [5 3] 

The quantities (Ax) T and x T A T are equal, by Theorem 3(d). Next, 

；b 3] 


xx 


25 15 

15 9 


x T x = [5 3 ] 


[25 + 9] = 34 


A 1 x 1 matrix such as x r x is usually written without the brackets. Finally, A T x T is 
not defined, because x r does not have two rows to match the two columns of A T . 

2. The fastest way to compute A 2 x is to compute A{Ax). The product Ax requires 
16 multiplications, 4 for each entry, and A(Ax) requires 16 more. In contrast, the 
product A 2 requires 64 multiplications, 4 for each of the 16 entries in A 2 . After that, 
A 2 x takes 16 more multiplications, for a total of 80. 


2.2 THE INVERSE OF A MATRIX 


Matrix algebra provides tools for manipulating matrix equations and creating various 
useful formulas in ways similar to doing ordinary algebra with real numbers. This 
section investigates the matrix analogue of the reciprocal, or multiplicative inverse, of 
a nonzero number. 


0 10 0 0 
0 0 10 0 
0 0 0 1 0 
0 0 0 0 1 
0 0 0 0 0 


4 6 2 
/ / 1 

11 IX / / 

2 3 6 
/ / / 

11 11 11 

4 2 4 
/ / / 

11 
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Recall that the multiplicative inverse of a number such as 5 is 1/5 or 5 _1 . This 
inverse satisfies the equations 

5 一 1 _5=1 and 5.5 _1 = 1 

The matrix generalization requires both equations and avoids the slanted-line notation 
(for division) because matrix multiplication is not commutative. Furthermore, a full 
generalization is possible only if the matrices involved are square . 1 

An n x n matrix A is said to be invertible if there is an n x « matrix C such that 

CA = I and AC = I 

where I = I n , the n x n identity matrix. In this case, C is an inverse of A. In fact, C 
is uniquely determined by A, because if B were another inverse of A, then B = BI = 
B(AC) = (BA)C = IC = C. This unique inverse is denoted by A~ l , so that 


A~ l A = I and AA~ l = I 

A matrix that is not invertible is sometimes called a singular matrix, and an invertible 
matrix is called a nonsingular matrix. 


EXAMPLE 1 If A 


Thus C = A~ x . 



,then 


and 


■ 


Here is a simple formula for the inverse of a 2 x 2 matrix, along with a test to tell 
if the inverse exists. 


THEOREM 4 



b 

d 


If ad — be ^ 0, then A is invertible and 


A~ 



ad — be 


d -b 
—c a 


If ad — be = 0, then A is not invertible. 


The simple proof of Theorem 4 is outlined in Exercises 25 and 26. The quantity 
ad — be is called the determinant of A, and we write 

det A = ad — be 

Theorem 4 says that a 2 x 2 matrix A is invertible if and only if det ^4 7 ^ 0. 


1 One could say that anm x n matrix A is invertible if there exist n x m matrices C and D such that 
CA = I n and AD = I m . However, these equations imply that A is square and C = D. Thus A is invertible 
as defined above. See Exercises 23-25 in Section 2.1. 
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EXAMPLE 2 Find the inverse of ^ = - r • 

5 6 

SOLUTION Since det^ = 3(6) - 4(5) = -2 ^ 0, ^ is invertible, and 


1 

_ 6 -4" 


6/(-2) 

-4/(-2)' 


'-3 2 

^2 

-5 3 


-5/(-2) 

3/(-2) 


5/2 -3/2 


Invertible matrices are indispensable in linear algebra — mainly for algebraic calcu¬ 
lations and formula derivations, as in the next theorem. There are also occasions when 
an inverse matrix provides insight into a mathematical model of a real-life situation, as 
in Example 3, below. 

THEOREM 5 If A is an invertible n xn matrix, then for each b in R” ， the equation ^4x = b has 
the unique solution x = A~ l b. 

PROOF Take any b in R w . A solution exists because if A~ l b is substituted for x, 
then Ax = A(A~ l b) = (AA~ l )b = /b = b. So A~ l b is a solution. To prove that the 
solution is unique, show that if u is any solution, then u, in fact, must be A~ l b. Indeed, 
if An = b, we can multiply both sides by A~ l and obtain 

A~ l Au = A~ l b, In = A~ l b, and u = A~ l b 


EXAMPLE 3 A horizontal elastic beam is supported at each end and is subjected to 
forces at points 1 ， 2, 3, as shown in Fig. 1. Let f in R 3 list the forces at these points, and 
let y in M 3 list the amounts of deflection (that is, movement) of the beam at the three 
points. Using Hooke’s law from physics, it can be shown that 

y = Df 

where D is ^ flexibility matrix. Its inverse is called the stiffness matrix. Describe the 
physical significance of the columns of D and D~ l . 


#1 #2 #3 



SOLUTION Write / 3 = [ei e 2 e 3 ] and observe that 

D = DIt, = [ Dei Dt2 De^ ] 

Interpret the vector ei = (1,0,0) as a unit force applied downward at point 1 on the 
beam (with zero force at the other two points). Then Dei, the first column of D, lists 
the beam deflections due to a unit force at point 1. Similar descriptions apply to the 
second and third columns of D. 

To study the stiffness matrix D~ l , observe that the equation f = D~ l y computes a 
force vector f when a deflection vector y is given. Write 

D~ l = D~ l I 3 = [ D~ l e x D~ l e 2 D~ l e 3 ] 

Now interpret ei as a deflection vector. Then D~ l e\ lists the forces that create the 
deflection. That is, the first column of D~ l lists the forces that must be applied at the 
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THEOREM 6 


three points to produce a unit deflection at point 1 and zero deflections at the other points. 
Similarly, columns 2 and 3 of D~ l list the forces required to produce unit deflections at 
points 2 and 3, respectively. In each column, one or two of the forces must be negative 
(point upward) to produce a unit deflection at the desired point and zero deflections at 
the other two points. If the flexibility is measured, for example, in inches of deflection 
per pound of load, then the stiffness matrix entries are given in pounds of load per inch 
of deflection. ■ 

The formula in Theorem 5 is seldom used to solve an equation Ax = b numerically 
because row reduction of [ A b ] is nearly always faster. (Row reduction is usually 
more accurate, too, when computations involve rounding off numbers.) One possible 
exception is the 2x2 case. In this case, mental computations to solve Ax = b are 
sometimes easier using the formula for A~ l , as in the next example. 

EXAMPLE 4 Use the inverse of the matrix A in Example 2 to solve the system 

3x\ + 4%2 = 3 
5x\ + 6 x 2 = 7 

SOLUTION This system is equivalent to Ax = b, so 


_ -3 2 

"3" 


5" 

_5/2 -3/2_ 

_7_ 


_-3_ 


The next theorem provides three useful facts about invertible matrices. 

a. If A is an invertible matrix, then A~ l is invertible and 

(yl" 1 )- 1 = A 

b. If A and B are n x n invertible matrices, then so is AB, and the inverse of AB 
is the product of the inverses of A and B in the reverse order. That is, 

(^)- 1 = 

c. If A is an invertible matrix, then so is A T , and the inverse of A T is the transpose 
of A~ l . That is, 

(^) 一 1 = (A-y 

PROOF To verify statement (a), find a matrix C such that 

A~ l C = I and CA~ l = I 

In fact, these equations are satisfied with A in place of C . Hence A~ l is invertible, and 
A is its inverse. Next, to prove statement (b), compute: 

= AIA~ X = AA~ X = I 

A similar calculation shows that (B~ l = I. For statement (c), use Theorem 

3(d), read from right to left, (A~ l ) T A T = (AA~ l ) T = I T = I. Similarly, ^4 r (yl _1 ) r = 
I T = I• Hence A T is invertible, and its inverse is {A~ X ) T . ■ 

The following generalization of Theorem 6(b) is needed later. 

The product of n y.n invertible matrices is invertible, and the inverse is the 
product of their inverses in the reverse order. 
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There is an important connection between invertible matrices and row operations 
that leads to a method for computing inverses. As we shall see, an invertible matrix 
A is row equivalent to an identity matrix, and we can find A~ x by watching the row 
reduction of A to I. 


Elementary Matrices 

An elementary matrix is one that is obtained by performing a single elementary row 
operation on an identity matrix. The next example illustrates the three kinds of elemen¬ 
tary matrices. 


EXAMPLE 5 Let 



1 

0 

0 " 


"0 

1 

0 " 


"1 

0 

0 " 

E\ = 

0 

1 

0 

, e 2 = 

1 

0 

0 

, E 3 = 

0 

1 

0 


-4 

0 

1 


0 

0 

1 


0 

0 

5 


a b c 

A = d e f 

g h i 


Compute E\A, E 2 A, and E^A, and describe how these products can be obtained by 
elementary row operations on A. 

SOLUTION Verify that 



a b 

c 


_ d 

e 

r 

五 1乂 = 

d e 

f 

, e 2 a = 

a 

b 

c 


g — \a h _ 4b 

i — 4c 


g 

h 

i 


a b c 

E^A = d e f 

_ 5g 5h 5i 

Addition of —4 times row 1 of ^4 to row 3 produces E\A. (This is a row replacement 
operation.) An interchange of rows 1 and 2 of A produces E 2 A, and multiplication of 
row 3 of ^4 by 5 produces E^A. ■ 


Left-multiplication (that is, multiplication on the left) by E\ in Example 5 has the 
same effect on any 3 x n matrix. It adds —4 times row 1 to row 3. In particular, since 
E\ -1 = 五 1 ， we see that E\ itself is produced by this same row operation on the identity. 
Thus Example 5 illustrates the following general fact about elementary matrices. See 
Exercises 27 and 28. 


If an elementary row operation is performed on an m x « matrix A, the resulting 
matrix can be written as EA, where the m x m matrix E is created by performing 
the same row operation on I m . 


Since row operations are reversible, as shown in Section 1.1，elementary matrices 
are invertible, for if E is produced by a row operation on /, then there is another row 
operation of the same type that changes E back into I. Hence there is an elementary 
matrix F such that FE = I. Since E and F correspond to reverse operations, EF = I, 
too. 
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THEOREM 7 


Each elementary matrix E is invertible. The inverse of E is the elementary matrix 
of the same type that transforms E back into I. 


EXAMPLE 6 


1 0 

Find the inverse of E\ = 0 1 

-4 0 


0 

0 

1 


SOLUTION To transform E\ into /， add +4 times row 1 to row 3. The elementary 


matrix that does this is 


1 

E~ l = 0 

+4 


0 0 
1 0 
0 1 


■ 


The following theorem provides the best way to “visualize” an invertible matrix, 
and the theorem leads immediately to a method for finding the inverse of a matrix. 


An n x n matrix A is invertible if and only if A is row equivalent to I n , and in 
this case, any sequence of elementary row operations that reduces A to I n also 
transforms I n into A~ l . 


PROOF Suppose that A is invertible. Then, since the equation Ax = b has a solution 
for each b (Theorem 5), A has a pivot position in every row (Theorem 4 in Section 1.4). 
Because A is square, the n pivot positions must be on the diagonal, which implies that 
the reduced echelon form of 乂 is /„. That is, A 〜 I n . 

Now suppose, conversely, that A 〜 I n . Then, since each step of the row reduction 
of A corresponds to left-multiplication by an elementary matrix, there exist elementary 
matrices E\,..., E p such that 

乂 〜五 1^4 〜 E2^E\A) 〜 .•. 〜 Ep(Ep—\ •. • E\A) = I n 

That is, 

E p ---E x A = I n (1) 

Since the product E\ of invertible matrices is invertible, (1) leads to 

(E p ---E l )-\E p ---E l )A = 

A = 

Thus A is invertible, as it is the inverse of an invertible matrix (Theorem 6). Also, 

A- 1 = [{E p ---E x )~ x Y l = E p ---E x 

Then A~ l = E p - •• E\- I n , which says that A~ l results from applying E\ ，…， E p suc¬ 
cessively to I n . This is the same sequence in (1) that reduced A io I n . 

An Algorithm for Finding A -1 


If we place A and I side-by-side to form an augmented matrix [ A I ], then row 
operations on this matrix produce identical operations on A and on I. By Theorem 7, 
either there are row operations that transform A to I n and I n to A~ l or else A is not 
invertible. 
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ALGORITHM FOR FINDING AT 1 

Row reduce the augmented matrix [ A I ]. If A is row equivalent to I, then 
[A I ] is row equivalent to [ I A~ l ]. Otherwise, A does not have an inverse. 


0 1 2 

EXAMPLE 7 Find the inverse of the matrix A = 1 0 3 , if it exists. 

4-3 8 

SOLUTION 


Theorem 7 shows, since 乂 〜 /, that A is invertible, and 


A~ 


It is a good idea to check the final answer: 


AA~ l 


"0 

1 

2" 

"-9/2 

7 

-3/2" 


"1 

0 

0" 

1 

0 

3 

-2 

4 

-1 

= 

0 

1 

0 

4 

-3 

8 

3/2 

-2 

l/2_ 


0 

0 

1 


It is not necessary to check that A~ l A = I since A is invertible. 


■ 


Another View of Matrix Inversion 

Denote the columns of I n by ei,...,e„. Then row reduction of [ A 
can be viewed as the simultaneous solution of the n systems 


/ ] to [/ A~ 


Ax = A\ = e 2 , 


Ax = e n 


⑵ 


where the “augmented columns” of these systems have all been placed next to A to form 
[A ei e 2 ... e n ] = [A I ]. The equation AA~ l = I and the definition of matrix 
multiplication show that the columns of A~ l are precisely the solutions of the systems 
in (2). This observation is useful because some applied problems may require finding 
only one or two columns of A~ l . In this case, only the corresponding systems in (2) 
need be solved. 


-9/2 7 -3/2 

-2 4 -1 

3/2 -2 1/2 


1 0 0 -9/2 7 -3/2 

0 10-2 4-1 

0 0 13/2 -2 1/2 


1 

0 

3 

0 

1 

0 


1 

0 

3 

0 

1 

0 

0 

1 

2 

1 

0 

0 

〜 

0 

1 

2 

1 

0 

0 

0 

-3 

-4 

0 

-4 

1 


0 

0 

2 

3 

-4 

1 


0 

1 

2 

1 

0 

0 

〜 

1 

0 

3 

0 

1 

0 

1 

0 

3 

0 

1 

0 

0 

1 

2 

1 

0 

0 

4 

-3 

8 

0 

0 

1 

4 

-3 

8 

0 

0 

1 


/2 


o 1 

3 2 

o 1 


/2 

3 


yl 
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WEB 


i— NUMERICAL NOTE - 

In practical work, A~ l is seldom computed, unless the entries of A~ l are needed. 
Computing both A~ l and A~ l b takes about three times as many arithmetic 
operations as solving Ax = b by row reduction, and row reduction may be more 
accurate. 


PRACTICE PROBLEMS 


1. Use determinants to determine which of the following matrices are invertible. 



'3 

-9' 

b. 

"4 -9" 


6 -9" 

a. 

2 

6 

0 5 

c. 

-4 6 


1 - 2-1 

2. Find the inverse of the matrix A = —1 5 6 , if it exists. 

5-4 5 


2.2 EXERCISES 


Find the inverses of the matrices in Exercises 1-4. 


8 6 
5 4 



2 

5 


3. 




4 . 


-4 

—6 


5. Use the inverse found in Exercise 1 to solve the system 

8xj ~h 6x2 ― 2 

5xi 4^2 — _ 1 

6. Use the inverse found in Exercise 3 to solve the system 

lx\ 3x2 = _ 9 
—6x\ — 3x2 = 4 


7. Let A : 


2 

12 


bi 


b 2 


b 3 


and b 4 = ^ . 

a. Find A~ l , and use it to solve the four equations 


v4x = bi, Ax = b 2 , Ax = b 3 , Ax = b 4 


b. The four equations in part (a) can be solved by the 
same set of row operations, since the coefficient ma¬ 
trix is the same in each case. Solve the four equa¬ 
tions in part (a) by row reducing the augmented matrix 
[A bi b 2 b 3 b 4 ]. 


8. Suppose P is invertible and A = PBP - 1 . Solve for B in 
terms of A. 


In Exercises 9 and 10, mark each statement True or False. Justify 

each answer. 

9. a. In order for a matrix B to be the inverse of A, the 
equations AB = I and BA = I must both be true. 

b. If A and B are n x n and invertible, then A~ l B~ l is the 
inverse of AB. 

c. If ^4 = a ' and ab — cd ★ 0, then A is invertible. 

c d 」 

d. If A is an invertible n x n matrix, then the equation 
Ax = b is consistent for each b in IR n . 

e. Each elementary matrix is invertible. 

10 . a. If A is invertible, then elementary row operations that 

reduce A to the identity I n also reduce A~ l to I n . 

b. If A is invertible, then the inverse of A~ l is A itself. 

c. A product of invertible n x n matrices is invertible, and 
the inverse of the product is the product of their inverses 
in the same order. 

d. If A is an n x n matrix and Ax = e 7 - is consistent for 
every j e {1,2,, n}, then A is invertible. Note: 
ei,..., e„ represent the columns of the identity matrix. 

e. If A can be row reduced to the identity matrix, then A 
must be invertible. 

11 . Let A be an invertible n xn matrix, and let 5 be an « x /? 
matrix. Show that the equation AX = B has a unique solu¬ 
tion A~ l B. 

12. Use matrix algebra to show that if A is invertible and D 
satisfies AD = /, then D = A~ l . 

13 . Suppose AB = AC, where B and C are n 乂 p matrices and 
A is invertible. Show that B = C. Is this true, in general, 
when A is not invertible? 
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14. Suppose (B — C)D =0, where B and C are m x n matrices 
and D is invertible. Show that B = C • 

15. Let A be an invertible n x n matrix, and let B be an n x p 
matrix. Explain why A~ l B can be computed by row reduc¬ 
tion: 

If [A 5] 〜…〜 [/ X],th&nX = A~ l B. 

If A is larger than 2x2, then row reduction of [^4 B] is 
much faster than computing both A~ l and A~ l B. 

16. Suppose A and B are n xn matrices, B is invertible, and AB 
is invertible. Show that A is invertible. [Hint: Let C = AB, 
and solve this equation for A.] 

17. Suppose A, B, and C are invertible n x n matrices. Show 
that ABC is also invertible by producing a matrix D such 
that (ABC)D = I and D(ABC) = I. 

18. Solve the equation AB = BC for A, assuming that A, B, and 
C are square and B is invertible. 

19. If A, B, and C are n xn invertible matrices, does the equa¬ 
tion C~ { (A + X)B~ l = I n have a solution, X? If so, find 
it. 

20 . Suppose A, B, and X are n y.n matrices with A, X, and 
A — AX invertible, and suppose 

(A-AX)~ l = X~ l B (3) 

a. Explain why B is invertible. 

b. Solve equation (3) for X.lfa. matrix needs to be inverted, 
explain why that matrix is invertible. 

21. Explain why the columns of an n x n matrix A are linearly 
independent when A is invertible. 

22. Explain why the columns of an n x n matrix A span JR 71 when 
A is invertible. [Hint: Review Theorem 4 in Section 1.4.] 

23. Suppose A is n x n and the equation Ax = 0 has only the 
trivial solution. Explain why A has n pivot columns and A is 
row equivalent to I n • By Theorem 7, this shows that A must 
be invertible. (This exercise and Exercise 24 will be cited in 
Section 2.3.) 

24. Suppose A is n x n and the equation Ax = b has a solution 
for each b in Explain why A must be invertible. [Hint: 
Is A row equivalent to /„?] 

Exercises 25 and 26 prove Theorem 4 for A = ^ ^ . 

25. Show that if ad — be = 0, then the equation Ax = 0 has 
more than one solution. Why does this imply that A is not 
invertible? [Hint: First, consider a = b = 0. Then, if a and 

一 b 

b are not both zero, consider the vector x = ^ .] 

26. Show that if ad — be ^ 0, the formula for A~ l works. 

Exercises 27 and 28 prove special cases of the facts about elemen¬ 
tary matrices stated in the box following Example 5. Here ^4 is a 
3x3 matrix and I = 1^. (A general proof would require slightly 
more notation.) 


27. Let ^4 be a 3 x 3 matrix. 

a. Use equation (2) from Section 2.1 to show that 
row, (^4) = row, (/) - A, for i = 1,2, 3. 

b. Show that if rows 1 and 2 of A are interchanged, then the 
result may be written as EA, where E is an elementary 
matrix formed by interchanging rows 1 and 2 of I. 

c. Show that if row 3 of ^4 is multiplied by 5, then the result 
may be written as EA, where E is formed by multiplying 
row 3 of / by 5. 


28. Suppose row 2 of ^4 is replaced by row2(^4) — 3 • rowi(^l). 
Show that the result is EA, where E is formed from 1 by 
replacing row 2 (/) by row 2 (/) — 3 • rowi(yl). 

Find the inverses of the matrices in Exercises 29-32, if they exist. 
Use the algorithm introduced in this section. 


29. 




30. 



6 


1 0-2 
31. -3 1 4 

2-3 4 


1 2-1 
32. -4 -7 3 

—2 -6 4 


33. Use the algorithm from this section to find the inverses of 


1 0 0 

1 1 0 and 

1 1 1 


10 0 0 
110 0 
1110 
1111 


Let A be the corresponding n 乂 n matrix, and let B be its 
inverse. Guess the form of 5, and then show that AB = I. 


34. Repeat the strategy of Exercise 33 to guess the inverse B of 


- 1 0 0 

2 2 0 

^=333 

_ n n n 

Show that AB = / . 

"-1 -7 

35. Let A = 2 15 


0 一 

0 

0 


n 


.Find the third column of A 


3 
6 
2 

without computing the other columns. 



"-25 

-9 

-27" 


36. [M] Let A = 

536 

185 

537 

.Find the second and 


154 

52 

143 



third columns of A~ l without computing the first column. 


1 2 

37. Let A = 1 3 . Construct a 2 x 3 matrix C (by trial and 

_1 5_ 

error) using only 1,-1, and 0 as entries, such that CA = 1 2 . 
Compute AC and note that AC ^ I 3 . 
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38. Let A 


0 


1 

-1 


Construct a 4 x 2 matrix 


.011 

•003 

.001 

.003 

.009 

.003 

.001 

•003 

.011 


D using only 1 and 0 as entries, such that AD = /】.Is it 
possible that CA = 1 4 for some 4x2 matrix Cl Why or 
why not? 

39. [M] Let 


D 


be a flexibility matrix, with flexibility measured in inches per 
pound. Suppose that forces of 40,50, and 30 lb are applied at 
points 1, 2, and 3, respectively, in Fig. 1 of Example 3. Find 
the corresponding deflections. 

40. [M] Compute the stiffness matrix D~ l for D in Exercise 39. 
List the forces needed to produce a deflection of .04 in. at 
point 3, with zero deflections at the other points. 


41. [M] Let 


D 


•0130 

.0050 

.0020 

.0010 

•0050 

.0100 

.0040 

.0020 

.0020 

.0040 

.0100 

.0050 

.0010 

.0020 

.0050 

.0130 


be a flexibility matrix for an elastic beam such as the one in 
Example 3, with four points at which force is applied. Units 
are centimeters per newton of force. Measurements at the 
four points show deflections of .07, .12, .16，and .12 cm. 
Determine the forces at the four points. 


42. [M] With D as in Exercise 41, determine the forces that 
produce a deflection of .22 cm at the second point on the 
beam, with zero deflections at the other three points. How is 
the answer related to the entries in D~ lc ! [Hint: First answer 
the question when the deflection is 1 cm at the second point.] 


SOLUTIONS TO PRACTICE PROBLEMS 


1. a. det 


b. det 


3- 9 
2 6 

the matrix is invertible. 

4- 9 

0 5 


3 • 6 — (—9) - 2 = 18+18 = 36. The determinant is nonzero, so 
^rtible. 

4 • 5 — (—9) • 0 = 20 7^ 0. The matrix is invertible. 


c. det 


2. [A /] 


6.6 — (—9)(—4) = 36 — 36 = 0. The matrix is not invertible. 


1 - 2-1 10 0 

-1 5 6 0 1 0 

5 -4 5 0 0 1 


1 - 2-1 1 
0 3 5 1 

0 6 10 -5 

1 - 2-1 1 
0 3 5 1 

0 0 0 -7 


So [ A I ] is row equivalent to a matrix of the form [ B D ], where B is square 
and has a row of zeros. Further row operations will not transform B into /, so we 
stop. A does not have an inverse. 


2.3 CHARACTERIZATIONS OF INVERTIBLE MATRICES 


This section provides a review of most of the concepts introduced in Chapter 1, in 
relation to systems of n linear equations in n unknowns and to square matrices. The 
main result is Theorem 8. 
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THEOREM 8 The Invertible Matrix Theorem 

Let yl be a square n x n matrix. Then the following statements are equivalent. 
That is, for a given A, the statements are either all true or all false. 

a. A is an invertible matrix. 

b. A is row equivalent to the n xn identity matrix. 

c. A has n pivot positions. 

d. The equation Ax = 0 has only the trivial solution. 

e. The columns of A form a linearly independent set. 

f. The linear transformation x i-^ Ax is one-to-one. 

g. The equation Ax = b has at least one solution for each b in R”. 

h. The columns of A span R n . 

i. The linear transformation x \-^ Ax maps onto R w . 

j. There is an n x n matrix C such that CA = I. 

k. There is an « x « matrix D such that AD = I. 

l. A t is an invertible matrix. 


(b) (j) 


(c) <= (d) 


FIGURE 1 



(k) 


(a) 

<= 

(g) 

(g) <=> 

•(h) 

◎⑴ 

(d) <^> 

•⑹ 

(f) 


(a) <^> (1) 


First, we need some notation. If the truth of statement (a) always implies that 
statement (j) is true, we say that (a) implies (j) and write (a) (j). The proof will 

establish the “circle” of implications shown in Fig. 1. If any one of these five statements 
is true, then so are the others. Finally, the proof will link the remaining statements of 
the theorem to the statements in this circle. 

PROOF If statement ⑻ is true, then A~ l works for C in (j), so (a) (j). Next, (j) (d) 

by Exercise 23 in Section 2.1. (Turn back and read the exercise.) Also, (d) => (c) by 
Exercise 23 in Section 2.2. If A is square and has n pivot positions, then the pivots 
must lie on the main diagonal, in which case the reduced echelon form of A is I n . Thus 
(c) => (b). Also, (b) => (a) by Theorem 7 in Section 2.2. This completes the circle in 
Fig. 1. _ 

Next, (a) (k) because A~ l works for D. Also, (k) (g) by Exercise 26 in Sec¬ 

tion 2.1, and (g) => (a) by Exercise 24 in Section 2.2. So (k) and (g) are linked to 
the circle. Further, (g), (h), and (i) are equivalent for any matrix, by Theorem 4 in 
Section 1.4 and Theorem 12(a) in Section 1.9. Thus, (h) and (i) are linked through (g) 
to the circle. 

Since (d) is linked to the circle, so are (e) and (f), because (d), (e), and (f) are 
all equivalent for any matrix A. (See Section 1.7 and Theorem 12(b) in Section 1.9.) 
Finally, (a) (1) by Theorem 6(c) in Section 2.2, and (1) => (a) by the same theorem 

with A and A T interchanged. This completes the proof. ■ 

Because of Theorem 5 in Section 2.2, statement (g) in Theorem 8 could also be 
written as “The equation Ax = b has a unique solution for each b in W 1 .^ This statement 
certainly implies (b) and hence implies that A is invertible. 

The next fact follows from Theorem 8 and Exercise 12 in Section 2.2. 


Let A and B be square matrices. If AB = I, then A and B are both invertible, 
with B = A~ l and A = B~ x . 



2.3 Characterizations of Invertible Matrices 113 


The Invertible Matrix Theorem divides the set of all n x n matrices into two disjoint 
classes: the invertible (nonsingular) matrices, and the noninvertible (singular) matrices. 
Each statement in the theorem describes a property of every n x n invertible matrix. 
The negation of a statement in the theorem describes a property of every n x n singular 
matrix. For instance, an w x /? singular matrix is not row equivalent to I n , does not have 
n pivot positions, and has linearly dependent columns. Negations of other statements 
are considered in the exercises. 


Expanded Table 
for the 旧 T 2-10 


EXAMPLE 1 


SOLUTION 


Use the Invertible Matrix Theorem to decide if A is invertible: 

1 0 -2 一 
A = 3 1-2 

-5-1 9 



"1 

0 

-2" 


"1 

0 

-2" 

j 〜 

0 

1 

4 

〜 

0 

1 

4 


0 

-1 

-1 


0 

0 

3 


So A has three pivot positions and hence is invertible, by the Invertible Matrix Theorem, 
statement (c). ■ 

The power of the Invertible Matrix Theorem lies in the connections it provides 
among so many important concepts, such as linear independence of columns of a matrix 
A and the existence of solutions to equations of the form Ax = b. It should be empha¬ 
sized, however, that the Invertible Matrix Theorem applies only to square matrices. For 
example, if the columns of a 4 x 3 matrix are linearly independent, we cannot use the 
Invertible Matrix Theorem to conclude anything about the existence or nonexistence of 
solutions to equations of the form Ax = b. 


Invertible Linear Transformations 

Recall from Section 2.1 that matrix multiplication corresponds to composition of linear 
transformations. When a matrix A is invertible, the equation A~ l Ax = x can be viewed 
as a statement about linear transformations. See Fig. 2. 


Multiplication 



by A -1 


FIGURE 2 A~ l transforms Ax back to x. 

A linear transformation T : R w —> M 71 is said to be invertible if there exists a func¬ 
tion S : > R n such that 


S(T(x)) = x for all x in R n (1) 

r(»S(x)) = x for all x in R n (2) 


The next theorem shows that if such an S exists, it is unique and must be a linear 
transformation. We call S the inverse of T and write it as T~ l . 
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THEOREM 9 


WEB 


Let T : W 1 —>■ be a linear transformation and let A be the standard matrix for 
T • Then T is invertible if and only if A is an invertible matrix. In that case, the 
linear transformation S given by 5(x) = A~ l x is the unique function satisfying 
equations (1) and (2). 


PROOF Suppose that T is invertible. Then (2) shows that T is onto , for if b is in 
and x = S(b), then T(x) = T (*S(b)) = b, so each b is in the range of T. Thus A is 
invertible, by the Invertible Matrix Theorem, statement (i). 

Conversely, suppose that A is invertible, and let *S(x) = A~ l x. Then, 5 is a linear 
transformation, and S obviously satisfies (1) and (2). For instance, 

S(T(x)) = S(Ax) = A^ l (Ax)=x 

Thus T is invertible. The proof that S is unique is outlined in Exercise 38. ■ 

EXAMPLE 2 What can you say about a one-to-one linear transformation T from 
R n into R n 7 

SOLUTION The columns of the standard matrix A of T are linearly independent (by 
Theorem 12 in Section 1.9). So A is invertible, by the Invertible Matrix Theorem, and 
T maps R n onto R n . Also, T is invertible, by Theorem 9. ■ 


i— NUMERICAL NOTES - 

In practical work, you might occasionally encounter a “nearly singular” or ill- 
conditioned matrix—an invertible matrix that can become singular if some of 
its entries are changed ever so slightly. In this case, row reduction may produce 
fewer than n pivot positions, as a result of roundoff error. Also, roundoff error 
can sometimes make a singular matrix appear to be invertible. 

Some matrix programs will compute a condition number for a square 
matrix. The larger the condition number, the closer the matrix is to being singular. 
The condition number of the identity matrix is 1. A singular matrix has an 
infinite condition number. In extreme cases, a matrix program may not be able 
to distinguish between a singular matrix and an ill-conditioned matrix. 

Exercises 41-45 show that matrix computations can produce substantial error 
when a condition number is large. 


PRACTICE PROBLEMS 



"2 

3 

4" 


Determine if ^4 = 

2 

3 

4 

is invertible. 


2 

3 

4 



2. Suppose that for a certain n xn matrix A, statement (g) of the Invertible Matrix 
Theorem is not true. What can you say about equations of the form Ax = b? 

3. Suppose that A and B are n x n matrices and the equation ABx = 0 has a nontrivial 
solution. What can you say about the matrix AB1 
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2.3 EXERCISES 


Unless otherwise specified, assume that all matrices in these d. If the equation Ax = b has at least one solution for each b 

exercises are n x n. Determine which of the matrices in Exer- in then the transformation x i-^- Ax is not one-to-one. 


cises 1-10 are invertible. Use as few calculations as possible. 
Justify your answers. 


e. If there is a b in R n such that the equation Ax = b is 
consistent, then the solution is unique. 


1 5 7 

3 - 6 _ 

"300 
3. -3-4 0 

8 5-3 

3 0-3 

5. 2 0 4 

_-4 0 7 

'-1 -3 0 


2 . 


-4 2 

6 -3 


-5 1 4 

4. 0 0 0 

1 4 9 


1 一 3 -6 
6. 0 4 3 

-3 6 0- 

"3 4 7 4 


13. An m xn upper triangular matrix is one whose entries 
below the main diagonal are 0’s (as in Exercise 8). When 
is a square upper triangular matrix invertible? Justify your 
answer. 

14. An m xn lower triangular matrix is one whose entries 
above the main diagonal are 0’s (as in Exercise 3). When 
is a square lower triangular matrix invertible? Justify your 
answer. 

15. Is it possible for a 4 x 4 matrix to be invertible when its 
columns do not span R 4 ? Why or why not? 


3 5 8 -3 

-2-6 3 2 

0-121 


8 . 


4 

9. [M] 1 

-1 


0 -3 -7 
9 9 9 

-5 10 19 

2 4-1 


1 4 6 
0 2 8 
0 0 1 


10. [M] 


5 3 

6 4 

7 5 

9 6 

8 5 


1 7 9 

2 8-8 

3 10 9 

4 -9 -5 

2 114 


In Exercises 11 and 12, the matrices are all n x n. Each part 
of the exercises is an implication of the form “If < statement 1), 
then (statement 2Mark an implication as True if the truth of 
(statement 2) always follows whenever (statement 1) happens 
to be true. An implication is False if there is an instance in which 
(statement 2 ) is false but ( statement 1 ) is true. Justify each 
answer. 


11. a. If the equation Ax = 0 has only the trivial solution, then 

A is row equivalent to the n x n identity matrix. 

b. If the columns of A span R n , then the columns are lin¬ 
early independent. 

c. If A is an n x n matrix, then the equation Ax = b has at 
least one solution for each b in R n . 

d. If the equation Ax = 0 has a nontrivial solution, then A 
has fewer than n pivot positions. 

e. If A t is not invertible, then A is not invertible. 

12. a. If there is an /i x n matrix D such that AD = /, then 

DA = I. 

b. If the linear transformation x Ax maps into R”, 
then the row reduced echelon form of ^4 is /. 


16. If an n x n matrix A is invertible, then the columns of A T are 
linearly independent. Explain why. 

17. Can a square matrix with two identical columns be invert¬ 
ible? Why or why not? 

18. Can a square matrix with two identical rows be invertible? 
Why or why not? 

19. If the columns of a 7 x 7 matrix D are linearly independent, 
what can be said about the solutions of Dx = b? Why? 

20. If ^4 is a 5 x 5 matrix and the equation Ax = b is consistent 
for every b in M 5 , is it possible that for some b, the equation 
Ax = b has more than one solution? Why or why not? 

21. If the equation Cu = v has more than one solution for some 
y in can the columns of the n x n matrix C span R n ? 
Why or why not? 

22. If n x n matrices E and F have the property that EF = /, 
then E and F commute. Explain why. 

23. Assume that F is an /i x « matrix. If the equation Fx = y 
is inconsistent for some y in , what can you say about the 
equation Fx = 0? Why? 

24. If an n x « matrix G cannot be row reduced to I n , what can 
you say about the columns of G? Why? 

25. Verify the boxed statement preceding Example 1. 

26. Explain why the columns of A 2 span whenever the 
columns of an n x n matrix A are linearly independent. 

27. Let A and Bb&n x n matrices. Show that if AB is invertible, 
so is A. You cannot use Theorem 6(b), because you cannot 
assume that A and B are invertible. [Hint: There is a matrix 
W such that ABW = I. Why?] 

28. Let A and Bben x n matrices. Show that if AB is invertible, 
so is B. 

29. If yl is an /I x « matrix and the transformation x i-^- Ax is 


c. If the columns of A are linearly independent, then the 
columns of A span R n . 


one-to-one, what else can you say about this transformation? 
Justify your answer. 
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30. If A is an n x n matrix and the equation Ax = b has 
more than one solution for some b, then the transformation 
x Ax is not one-to-one. What else can you say about this 
transformation? Justify your answer. 

31. Suppose A is an n x n matrix with the property that the 
equation Ax = b has at least one solution for each b in R”. 
Without using Theorems 5 or 8 , explain why each equation 
Ax = b has in fact exactly one solution. 

32. Suppose Aismn x n matrix with the property that the equa¬ 
tion Ax = 0 has only the trivial solution. Without using the 
Invertible Matrix Theorem, explain directly why the equation 
Ax = b must have a solution for each b in . 

In Exercises 33 and 34, r is a linear transformation from R 2 into 

R 2 . Show that T is invertible and find a formula for T— 1 . 

33. T (xj, X 2 ) = ( — 5xi + 9x2,4xi — 7x2) 

34. T (xi, X 2 ) = (2x\ — 8 x 2 , —2xi + 7x2) 

35. Let T : be an invertible linear transformation. Ex¬ 

plain why T is both one-to-one and onto R n . Use equations 

(1) and (2). Then give a second explanation using one or 
more theorems. 

36. Suppose a linear transformation T : R n —^ R n has the prop¬ 
erty that T (u) = T (y) for some pair of distinct vectors u and 
y in R n . Can T map W 1 onto R n ? Why or why not? 

37. Suppose T and U are linear transformations from to 
V such that T(U(x)) = x for all x in MJ 1 . Is it true that 
U(T (x)) = x for all x in R ”？ Why or why not? 

38. Let T : R” — be an invertible linear transformation, 
and let S and U be functions from R n into such that 
S(T (x)) = x and U(T(x)) = x for all x in R n . Show that 
U(\) = ^(y) for all y in R n . This will show that T has a 
unique inverse, as asserted in Theorem 9. [Hint: Given any 
y in we can write y = 7"(x) for some x. Why? Compute 
S (y) and t/(v).] 

39. Let r be a linear transformation that maps onto W 1 . Show 

that T~ l exists and maps R n onto R' Is T~ l also one-to- 
one? 

40. Suppose T and S satisfy the invertibility equations (1) and 

(2) , where T is a linear transformation. Show directly that 
5 is a linear transformation. [Hint: Given u,y in R' let 
x = S(u), y = *S(y). Then T (x) = u, T (y) = v. Why? Ap¬ 
ply S to both sides of the equation r(x) + 7"(y) = T(x + y). 
Also, consider T(cx) = cT(x).] 


41. [M] Suppose an experiment leads to the following system of 
equations: 

4.5xi + 3.1 x 2 = 19.249 

(3) 

1.6^1 + l.lx 2 = 6.843 

a. Solve system (3), and then solve system (4), below, in 
which the data on the right have been rounded to two 
decimal places. In each case, find the exact solution. 

4.5xi + 3.1 x 2 = 19.25 

⑷ 

1.6xi + 1.1x2 = 6.84 

b. The entries in system (4) differ from those in system (3) 
by less than .05%. Find the percentage error when using 
the solution of (4) as an approximation for the solution of 

(3). 

c. Use a matrix program to produce the condition number of 
the coefficient matrix in (3). 

Exercises 42-44 show how to use the condition number of a 
matrix A to estimate the accuracy of a computed solution of 
Ax = b. If the entries of A and b are accurate to about r significant 
digits and if the condition number of A is approximately 10 fe (with 
k a positive integer), then the computed solution of Ax = b should 
usually be accurate to at least r — k significant digits. 

42. [M] Let A be the matrix in Exercise 9. Find the condition 
number of ^4. Construct a random vector x in R 4 and compute 
b = Ax. Then use a matrix program to compute the solution 
Xi of Ax = b. To how many digits do x and Xi agree? Find 
out the number of digits the matrix program stores accurately, 
and report how many digits of accuracy are lost when Xi is 
used in place of the exact solution x. 

43. [M] Repeat Exercise 42 for the matrix in Exercise 10. 

44. [M] Solve an equation Ax = b for a suitable b to find the last 
column of the inverse of the fifth-order Hilbert matrix 


A = 


How many digits in each entry of x do you expect to be 
correct? Explain. [Note: The exact solution is (630, —12600, 
56700, -88200, 44100).] 

45. [M] Some matrix programs, such as MATLAB, have a com¬ 
mand to create Hilbert matrices of various sizes. If possible, 
use an inverse command to compute the inverse of a twelfth- 
order or larger Hilbert matrix, A. Compute AA~ l . Report 
what you find. 
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SG 


4 5 6 7 8 

/ / / / / 


2 3 4 5 6 

/ / / / / 


2 3 4 5 
/ / / / 

lx IX IX li 
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SOLUTIONS TO PRACTICE PROBLEMS 


1. The columns of A are obviously linearly dependent because columns 2 and 3 are 
multiples of column 1. Hence A cannot be invertible, by the Invertible Matrix 
Theorem. 

2. If statement (g) is not true, then the equation Ax = b is inconsistent for at least one 
bin R n . 

3. Apply the Invertible Matrix Theorem to the matrix AB in place of A. Then statement 
(d) becomes: ABx = 0 has only the trivial solution. This is not true. So AB is not 
invertible. 


2.4 PARTITIONED MATRICES 


A key feature of our work with matrices has been the ability to regard a matrix ^4 as a list 
of column vectors rather than just a rectangular array of numbers. This point of view has 
been so useful that we wish to consider other partitions of A, indicated by horizontal 
and vertical dividing rules, as in Example 1 below. Partitioned matrices appear in most 
modern applications of linear algebra because the notation highlights essential structures 
in matrix analysis, as in the chapter introductory example on aircraft design. This section 
provides an opportunity to review matrix algebra and use the Invertible Matrix Theorem. 



EXAMPLE 1 The matrix 


A = 

" 3 0-1 

-5 2 4 

5 9 

0 -3 

- 2 ' 

_-8 -6 3 

1 7 

-4_ 


can also be written as the 2x3 partitioned (or block) matrix 

^ _ 火 11 山 2 山 3 

_ ^21 ^22 ^23 _ 

whose entries are the blocks (or submatrices) 



3 0 -1" 


"5 9' 


"- 2 " 

= 

-5 2 4 

， Mi = 

0-3 

，火 13 = 

1 


乂 21 = [-8 -6 3], A 2 2 = [l 7], A 2 3 = [-4] ■ 


EXAMPLE 2 When a matrix A appears in a mathematical model of a physical 
system such as an electrical network, a transportation system, or a large corporation, 
it may be natural to regard ^4 as a partitioned matrix. For instance, if a microcomputer 
circuit board consists mainly of three VLSI (very large-scale integrated) microchips, 
then the matrix for the circuit board might have the general form 



A n 

a 12 


A = 

^21 

^22 

^23 


^31 

^32 



The submatrices on the “diagonal” of 乂一 namely, A\\, A 22 , and ^ 33 —concern the three 
VLSI chips, while the other submatrices depend on the interconnections among those 
microchips. ■ 
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Addition and Scalar Multiplication 

If matrices A and B are the same size and are partitioned in exactly the same way, 
then it is natural to make the same partition of the ordinary matrix sum A B. In this 
case, each block of 乂 + 5 is the (matrix) sum of the corresponding blocks of A and B. 
Multiplication of a partitioned matrix by a scalar is also computed block by block. 

Multiplication of Partitioned Matrices 

Partitioned matrices can be multiplied by the usual row-column rule as if the block 
entries were scalars, provided that for a product AB ，the column partition of A matches 
the row partition of B. 


EXAMPLE 

3 

Let 



"2 

-3 

1 

0-4 

A = 

1 

5 

-2 

3-1 


0 

-4 

-2 

7-1 


An 

A 2 i 


A l2 

^22 



Bx 

B 2 


2 


The 5 columns of A are partitioned into a set of 3 columns and then a set of 2 
columns. The 5 rows of B are partitioned in the same way—into a set of 3 rows and 
then a set of 2 rows. We say that the partitions of A and B are conformable for block 
multiplication. It can be shown that the ordinary product AB can be written as 


AB = 

'A u A n ' 

"5," 


~ A U B X A n B 2 ' 


__5 4 一 

-6 2 

■义 21 ^22 _ 

_b 2 


A 2 \B\ + A22B2 








2 1 _ 


It is important for each smaller product in the expression for AB to be written with 
the submatrix from A on the left, since matrix multiplication is not commutative. For 
instance, 


M\B\ = ^ 
M2B2 = ^ 

Hence the top block in AB is 
^ 11-^1 + ^ 12^2 = 


r 

6 4" 

-2 1 


"15 

12 " 

2 


2 

-5 


-3 7 





「-1 

3' 


"-20 -8" 

5 

2 


-8 7_ 


"15 

12 " 

1 

"-20 -8' 


"-5 

4" 

2 

-5_ 

+ 

-8 7_ 


-6 

2 


■ 


The row-column rule for multiplication of block matrices provides the most general 
way to regard the product of two matrices. Each of the following views of a product 
has already been described using simple partitions of matrices: (1) the definition of Ax 
using the columns of A, (2) the column definition of AB, (3) the row-column rule for 
computing AB, and (4) the rows of AB as products of the rows of A and the matrix B. 
A fifth view of AB, again using partitions, follows in Theorem 10 below. 

The calculations in the next example prepare the way for Theorem 10. Here col ^： (^4) 
is the kth column of A, and rowk(B) is the kth row of B. 
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EXAMPLE 4 LqiA = 



a 

b~ 

and B = 

c 

d 


e 

f 


.Verify that 


AB = coli (^4) rowi(B) + col 2 (^ 4 )row 2 ( 5 ) + col 3 (^ 4 )row 3 ( 5 ) 


SOLUTION Each term above is an outer product. (See Exercises 27 and 28 in Sec¬ 
tion 2.1.) By the row-column rule for computing a matrix product, 

coli ⑷ rowi(5)= 
col 2 (^)row 2 (5)= 

C0I3 ⑷ row3(5)= 

Thus 

3 

^coU(yl)row^(5)= 
k=\ 

This matrix is obviously AB. Notice that the (1,1)-entry in AB is the sum of the (1,1)- 
entries in the three outer products, the (1,2)-entry in AB is the sum of the (1,2)-entries 
in the three outer products, and so on. ■ 


[a b ] 
[c d ] 

[e /]= 


-3a —3b 
a b 

c d 
-4c —4d 


2e If 
5e 5 / 


— 3ci + c + 
a — Ac -\- 5e 


— 3b d 2. f 
b — Ad + 5/ 


THEOREM 10 


Column-Row Expansion of AB 

If Aism x n and B is n x p, then 

AB = [coli(^4) col 2 (Z) 
=coli(^)row!(5) + 




col„04)] 


row i(B) 
row 2 (B) 


_row„(B) 

…+ col„(^)row„(5) 


⑴ 


PROOF For each row index i and column index 7 , the (i, y)-entry in coU(^4) row^(^) 
is the product of from col^ (^4) and b^j from row^(5). Hence the (/, y)-entry in the 
sum shown in equation ( 1 ) is 

anby + cmbij + … + ^inb n j 

(k = 1) (k = 2) (k = n) 

This sum is also the (/, j )-entry in AB, by the row-column rule. ■ 

Inverses of Partitioned Matrices 

The next example illustrates calculations involving inverses and partitioned matrices. 

EXAMPLE 5 A matrix of the form 

_ \A U A 12 

L 0 ^ 22 . 

is said to be block upper triangular. Assume that An is p x p, A 22 is q x q, and A is 
invertible. Find a formula for A~ l . 
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SOLUTION Denote A~ l by B and partition B so that 


~A n A n ~ 

~B U 

B\2 



0 ' 

0 A 22 _ 

B 21 

B 22 - 


0 

4. 


⑵ 


This matrix equation provides four equations that will lead to the unknown blocks 
Bu ， ... ， 召 22 . Compute the product on the left side of equation (2)，and equate each entry 
with the corresponding block in the identity matrix on the right. That is, set 


乂 11 忍 11 + ^ 12^21 = Ip ( 3 ) 

^11^12 + ^12^22 = 0 (4) 

^ 22^21 = 0 ( 5 ) 

^22^22 = Iq (6) 


By itself, equation (6) does not show that A 22 is invertible. However, since A 22 is 
square, the Invertible Matrix Theorem and (6) together show that A 22 is invertible and 
B 22 = ^22 - Next, left-multiply both sides of (5) by and obtain 

B21 = = 0 


so that (3) simplifies to 

^11^11 + 0 = I p 

Since An is square, this shows that A\\ is invertible and B\\ = . Finally, use these 

results with (4) to find that 

^ 11-^12 = —^ 12-^22 = —^ 12^22 and = —^11 ^ 12^22 

Thus 


A- 1 



山 2 

-1 


-A^/AnA- 1 ' 

0 

^22 


0 

A 22 . 


■ 


A block diagonal matrix is a partitioned matrix with zero blocks off the main 
diagonal (of blocks). Such a matrix is invertible if and only if each block on the diagonal 
is invertible. See Exercises 13 and 14. 


1 — NUMERICAL NOTES - 

1. When matrices are too large to fit in a computer’s high-speed memory, 
partitioning permits the computer to work with only two or three submatrices 
at a time. For instance, one linear programming research team simplified 
a problem by partitioning the matrix into 837 rows and 51 columns. The 
problem’s solution took about 4 minutes on a Cray supercomputer. 1 

2. Some high-speed computers, particularly those with vector pipeline architec¬ 
ture, perform matrix calculations more efficiently when the algorithms use 
partitioned matrices. 2 

3. Professional software for high-performance numerical linear algebra, such as 
LAPACK, makes intensive use of partitioned matrix calculations. 


1 The solution time doesn’t sound too impressive until you learn that each of the 51 block columns contained 
about 250,000 individual columns. The original problem had 837 equations and over 12,750,000 variables! 
Nearly 100 million of the more than 10 billion entries in the matrix were nonzero. See Robert E. Bixby et 
al., “Very Large-Scale Linear Programming: A Case Study in Combining Interior Point and Simplex 
Methods,” Operations Research, 40, no. 5 (1992): 885-897. 

2 The importance of block matrix algorithms for computer calculations is described in Matrix Computations, 
3rd ed., by Gene H. Golub and Charles F. van Loan (Baltimore: Johns Hopkins University Press, 1996). 
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In Exercises 11 and 12, mark each statement True or False. Justify 
each answer. 

11. a. If 4 = [山 A 2 ] and B = [Bi ], with A\ and 
A 2 the same sizes as B\ and B 2 , respectively, then 
yl + 5 = [乂 1 + 乂2 + 方2 ]. 


b. If ^ 


^11 ^ 4 12 

^21 ^22 


and B 


Bi 

B 2 


,then the partitions 


of A and B are conformable for block multiplication. 


12. a. If i4i, A 2 , B\, and B 2 are n x n matrices, A 


义 1 
义2 


,and 


B = [ Bi B 2 ], then the product BA is defined, but AB 
is not. 


b. If ^ 


A t 


13. Let A 


B 0 
0 C 


then the transpose of A is 


where B and C are square. Show that A 


is invertible if and only if both B and C are invertible. 

14. Show that the block upper triangular matrix A in Example 5 is 
invertible if and only if both An and A 22 are invertible. [Hint: 
If An and A 22 are invertible, the formula for A~ l given in Ex¬ 
ample 5 actually works as the inverse of A] This fact about 
A is an important part of several computer algorithms that 
estimate eigenvalues of matrices. Eigenvalues are discussed 
in Chapter 5. 

15. When a deep space probe is launched, corrections may 

be necessary to place the probe on a precisely calculated 
trajectory. Radio telemetry provides a stream of vectors, 
Xi,... ,x^, giving information at different times about how 
the probe’s position compares with its planned trajectory. 
Let Xk be the matrix [xi ••- x^]. The matrix is 

computed as the radar data are analyzed. When x&+i arrives, 
a new Gk-\-\ must be computed. Since the data vectors arrive 
at high speed, the computational burden could be severe. 
But partitioned matrix multiplication helps tremendously. 
Compute the column-row expansions of Gk and G^+i, and 
describe what must be computed in order to update Gk to 
form G/c-i-i. 


P T Q T 
R T S T 


w 

X 

Y 

Z 


In Exercises 1-9, assume that the matrices are partitioned con¬ 
formably for block multiplication. Compute the products shown 
in Exercises 1-4. 


■I 0" 

~ A 

B~ 

E I 

C 

D 


2 . 


■ E 

0" 

~ P 

Q~ 

0 

F 

R 

S 


3. 


0 I 

A 

B 

I 0 

c 

D 


4. 


I 0 
—E I 


In Exercises 5-8, find formulas for X, Y, and Z in terms of A, 
B, and C, and justify your calculations. In some cases, you may 
need to make assumptions about the size of a matrix in order to 
produce a formula. [Hint: Compute the product on the left, and 
set it equal to the right side.] 

0 /" 

Z 0_ 

I 0" 

0 I 


5. 


6 . 


A 

B 

I 

0 


C 

0_ 

X 

Y 


■X 

0" 

~A 

0" 


Y 

Z 

B 

c 



7. 


8 . 


ZOO 
Y 0 I 


A B 
0 I 


A Z 
0 0 
B I 


I 0 
0 I 


r x 

Y 

z' 


-1 

0 

0 " 

[ 0 

0 

I 


0 

0 

I 


9. Suppose is an invertible matrix. Find matrices ^21 and 
A^x (in terms of the blocks of B) such that the product below 
has the form indicated. Also, compute C 22 (in terms of the 
blocks of B). [Hint: Compute the product on the left, and set 
it equal to the right side.] 


10. The inverse of 


"I 

0 

0" 


"I 

0 

0" 

A 

I 

0 

is 

P 

I 

0 

B 

D 

I 


Q 

R 

I 


The exercises that follow give practice with matrix algebra and illustrate typical 
calculations found in applications. 

PRACTICE PROBLEMS 


1. Show that , . is invertible and find its inverse. 

A I 

2. Compute X T X, where X is partitioned as [X\ X 2 ]. 

2.4 EXERCISES 


25 

1 _ 


2 2 2 
C1C2C3 

cllo o 


2 2 2 
12 3 

cq cq cq 

112131 
cq CQ CQ 

1_I 

I I 
_y 

_y 

/ 213 
7474 


Find P, Q, and R. 
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The probe Galileo was launched October 18, 
1989, and arrived near Jupiter in early December 


1995. 


16. Let A 


A n 

^21 


An 
A 22 


If Au is invertible, then the ma¬ 


trix S = A 22 — An is called the Schur comple¬ 

ment of An ， Likewise, if A 22 is invertible, the matrix 
An — A\ 2 ^ 2 is called the Schur complement of ^ 22 - 
Suppose A\\ is invertible. Find X and Y such that 


r^n 

An 


I 0 

An 0 

I Y 

_^21 

^22 


X I 

0 S 

0 I 


⑺ 


17. Suppose the block matrix A on the left side of (7) is invertible 
and A\\ is invertible. Show that the Schur complement S of 
A\\ is invertible. [Hint: The outside factors on the right side 
of (7) are always invertible. Verify this.] When A and An 
are both invertible, (7) leads to a formula for A~ x , using S~ l , 
and the other entries in A. 


18. Let X be an m x n data matrix such that X T X is invertible, 
and let M = I m — X(X T X)~ l X T . Add a column xo to the 
data and form 

W = [X x 0 ] 

Compute W T W. The (1, l)-entry is X T X. Show that the 
Schur complement (Exercise 16) of X T X can be written 
in the form xjM xq. It can be shown that the quantity 
(xjMx 0 ) _1 is the (2,2)-entry in (W T W)~ l . This entry 
has a useful statistical interpretation, under appropriate 
hypotheses. 


19. Assume A — sl n is invertible and view (8) as a system of two 
matrix equations. Solve the top equation for x and substitute 
into the bottom equation. The result is an equation of the 
form W(s)u = y, where ^( 5 ) is a matrix that depends on s. 
PF(5) is called the transfer function of the system because 
it transforms the input u into the output y. Find and 
describe how it is related to the partitioned system matrix on 
the left side of (8). See Exercise 16. 


20. Suppose the transfer function in Exercise 19 is invert¬ 
ible for some s. It can be shown that the inverse transfer 
function W^)— 1 , which transforms outputs into inputs, is the 
Schur complement of A — BC — sl n for the matrix below. 
Find this Schur complement. See Exercise 16. 


A — BC — sl n B 
—C I m _ 

21. a. Verify that A 2 = I when A = \ ?. 

2 一 1 

b. Use partitioned matrices to show that M 2 = I when 

" 1 0 0 0 " 

2-100 
10-10 
0 1-21 


22. Generalize the idea of Exercise 21 by constructing a 6 x 6 
~A 0 0 " 

matrix M = 0 B 0 such that M 2 = I. Make C a 

C 0 D _ 

nonzero 2x2 matrix. Show that your construction works. 


23. Use partitioned matrices to prove by induction that the prod¬ 
uct of two lower triangular matrices is also lower triangular. 
[Hint: A (众 + 1) x (k + 1) matrix A\ can be written in the 
form below, where a is a scalar, v is in M^，and A is a k x k 
lower triangular matrix. See the Study Guide for help with 
induction.] 

TT~ The Principle of 
- Induction 2-19 

24. Use partitioned matrices to prove by induction that for 
n =2,3，...，the n x n matrix A shown below is invertible 
and 5 is its inverse. 



In the study of engineering control of physical systems, a standard 
set of differential equations is transformed by Laplace transforms 
into the following system of linear equations: 


^ — sl n 

B " 

X 


0" 

C 

Im_ 

u 


.y. 



where A is n x n, B is n x m, C is m x n, and 5 is a variable. 
The vector u in is the “input” to the system, y in W n is the 
“output,” and x in is the “state” vector. (Actually, the vectors 
x, u, and y are functions of s, but this does not affect the algebraic 
calculations in Exercises 19 and 20.) 


- 1 0 0 

1 1 0 

1 1 1 

_1 1 1 

"1 0 0 

一 1 1 0 


0 一 
0 
0 


0_ 

0 

0 


0 
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For the induction step, assume A and B are 
(k + 1) x (A: + 1) matrices, and partition A and 5 in a form 
similar to that displayed in Exercise 23. 

25. Without using row reduction, find the inverse of 


1 2 
3 5 

0 0 
0 0 
0 0 


0 0 0 

0 0 0 

2 0 0 

0 7 8 

0 5 6 


c. Create a 50 x 50 matrix of the form C = 八 iT • 

0 A 1 

[Note: It may not be necessary to specify the zero blocks 
inC.] 

27. [M] Suppose memory or size restrictions prevent a matrix 
program from working with matrices having more than 32 
rows and 32 columns, and suppose some project involves 
50 x 50 matrices A and B. Describe the commands or op¬ 
erations of the matrix program that accomplish the following 
tasks. 


26. [M] For block operations, it may be necessary to access or 
enter submatrices of a large matrix. Describe the functions 
or commands of a matrix program that accomplish the fol¬ 
lowing tasks. Suppose ^4 is a 20 x 30 matrix. 

a. Display the submatrix of A from rows 5 to 10 and 
columns 15 to 20. 

b. Insert a 5 x 10 matrix B into A, beginning at row 5 and 
column 10. 


a. Compute A + B. 

b. Compute AB. 

c. Solve Ax = b for some vector b in R 50 , assuming that 
A can be partitioned into a 2 x 2 block matrix [Aij], 
with An an invertible 20 x 20 matrix, A 22 an invertible 
30 x 30 matrix, and An a zero matrix. [Hint: Describe 
appropriate smaller systems to solve, without using any 
matrix inverses.] 


SOLUTIONS TO PRACTICE PROBLEMS 


If 


I 0 
A I 


is invertible, its inverse has the form 


W X 
Y Z 


.Verify that 


-1 0 " 

~W 

X~ 


_ w 

X " 

A I 

Y 

Z 


AW 

AX + Z 


So W, X, 7, Z must satisfy W = I, X = 0, AW Y = 0, and AX Z = I. It 
follows that Y = —A and Z = /. Hence 


I 0 
A I 


—i 


0 


i 0 
0 i 


The product in the reverse order is also the identity, so the block matrix is invert- 
I D ■ (You could also appeal to the Invertible Matrix 


ible, and its inverse is 
Theorem.) 

2. X T X 


-A I 


-x{- 

- - 


_Xl_ 

^2 



.The partitions of X T and X are 


Xjx x x\x 2 

X[Xi x[x 2 ^ 

automatically conformable for block multiplication because the columns of X T are 
the rows of X. This partition of X T X is used in several computer algorithms for 
matrix computations. 


2.5 MATRIX FACTORIZATIONS 


A factorization of a matrix A is an equation that expresses A as a product of two or more 
matrices. Whereas matrix multiplication involves a synthesis of data (combining the 
effects of two or more linear transformations into a single matrix), matrix factorization 
is an analysis of data. In the language of computer science, the expression of ^4 as a 
product amounts to a preprocessing of the data in A, organizing that data into two or 
more parts whose structures are more useful in some way, perhaps more accessible for 
computation. 
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Matrix factorizations and, later, factorizations of linear transformations will appear 
at a number of key points throughout the text. This section focuses on a factorization 
that lies at the heart of several important computer programs widely used in applica¬ 
tions, such as the airflow problem described in the chapter introduction. Several other 
factorizations, to be studied later, are introduced in the exercises. 


The LU Factorization 

The LU factorization, described below, is motivated by the fairly common industrial 
and business problem of solving a sequence of equations, all with the same coefficient 
matrix: 

Ax = b\, Ax = b 2 , •••， Ax = b p (1) 

See Exercise 32, for example. Also see Section 5.8, where the inverse power method 
is used to estimate eigenvalues of a matrix by solving equations like those in sequence 
(1), one at a time. 

When A is invertible, one could compute A~ l and then compute A~ l b\, A~ l b 2 , 
and so on. However, it is more efficient to solve the first equation in sequence (1) by 
row reduction and obtain an LU factorization of A at the same time. Thereafter, the 
remaining equations in sequence (1) are solved with the LU factorization. 

At first, assume that A is an m x n matrix that can be row reduced to echelon 
form, without row interchanges. (Later, we will treat the general case.) Then A can 
be written in the form A = LU, where L is an m x m lower triangular matrix with l’s 
on the diagonal and U is an m x n echelon form of A. For instance, see Fig. 1. Such 
a factorization is called an LU factorization of A. The matrix L is invertible and is 
called a unit lower triangular matrix. 


"l 

0 

0 

0 


■ 

* 

氺 

本 

*- 

本 

1 

0 

0 


0 

■ 

* 

本 

本 

氺 

本 

1 

0 


0 

0 

0 

■ 

本 

氺 

本 

本 

1 


0 

0 

0 

0 

0_ 


L U 


FIGURE 1 An LU factorization. 


Before studying how to construct L and U, we should look at why they are so 
useful. When A = LU, the equation Ax = b can be written as L(Ux) = b. Writing y 
for Ux, we can find x by solving the pair of equations 


Ly = b 
Ux = y 


⑵ 


First solve Ly = b for y, and then solve Ux = y for x. See Fig. 2. Each equation is 
easy to solve because L and U are triangular. 


EXAMPLE 1 It can be verified that 


A = 


3 

-7 

-2 

2 " 


" 1 

0 

0 

0 " 


"3 

-7 

-2 

2 " 

-3 

5 

1 

0 


-1 

1 

0 

0 


0 

-2 

-1 

2 

6 

-4 

0 

-5 


2 

-5 

1 

0 


0 

0 

-1 

1 

-9 

5 

-5 

12 


-3 

8 

3 

1 


0 

0 

0 

-1 


=LU 
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Multiplication 



FIGURE 2 Factorization of the mapping x i-> Ax. 


-9 

Use this LU factorization of A to solve Ax = b, where b = ^ 

11 


SOLUTION The solution of Ly = b needs only 6 multiplications and 6 additions, be¬ 
cause the arithmetic takes place only in column 5. (The zeros below each pivot in L are 
created automatically by the choice of row operations.) 


i L 


b] 


_ 1 

0 

0 

0 

-9" 


"1 

0 

0 

0 

-9" 

-1 

1 

0 

0 

5 


0 

1 

0 

0 

-4 

2 

-5 

1 

0 

7 


0 

0 

1 

0 

5 

-3 

8 

3 

1 

11 


0 

0 

0 

1 

1 


y] 


Then, for Ux = y, the “backward” phase of row reduction requires 4 divisions, 6 mul¬ 
tiplications, and 6 additions. (For instance, creating the zeros in column 4 of [ t/ y ] 
requires 1 division in row 4 and 3 multiplication-addition pairs to add multiples of row 4 
to the rows above.) 


[u 


y] 


"3 

-7 

-2 

2 

-9" 


"1 

0 

0 

0 

3 一 


3" 

0 

-2 

-1 

2 

-4 


0 

1 

0 

0 

4 

,x = 

4 

0 

0 

-1 

1 

5 


0 

0 

1 

0 

—6 

-6 

0 

0 

0 

-1 

1 


0 

0 

0 

1 

-1 


-1 


To find x requires 28 arithmetic operations, or “flops” (floating point operations), 
excluding the cost of finding L and U. In contrast, row reduction of [ A b ] to [/ x ] 
takes 62 operations. ■ 


The computational efficiency of the LU factorization depends on knowing L and U. 
The next algorithm shows that the row reduction of A to an echelon form U amounts to 
an LU factorization because it produces L with essentially no extra work. After the first 
row reduction, L and U are available for solving additional equations whose coefficient 
matrix is A. 


An LU Factorization Algorithm 


Suppose A can be reduced to an echelon form U using only row replacements that add a 
multiple of one row to another row below it. In this case, there exist unit lower triangular 
elementary matrices E\ ， ... ， E p such that 



E p ---E X A = U 

(3) 

Then 

A = {E p ---E x )- l U = LU 


where 

L = (Ep-.-EiT 1 

⑷ 
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It can be shown that products and inverses of unit lower triangular matrices are also unit 
lower triangular. (For instance, see Exercise 19.) Thus L is unit lower triangular. 

Note that the row operations in equation (3), which reduce A to U, also reduce 
the L in equation (4) to I, because E\L = {E p . • • E\)(E P … E\)~ l = I. This 
observation is the key to constructing L. 


ALGORITHM FOR AN LU FACTORIZATION 

1. Reduce A to an echelon form t/ by a sequence of row replacement operations, 
if possible. 

2. Place entries in L such that the same sequence of row operations reduces L 
to I. 


Step 1 is not always possible, but when it is, the argument above shows that an LU 
factorization exists. Example 2 will show how to implement step 2. By construction, L 
will satisfy 

{E P --E X )L = I 

using the same 五 i,〜as in equation (3). Thus L will be invertible, by the Invertible 
Matrix Theorem, with (E p - - - ^i) = L— 1 . From (3), L~ x A = U, and A = LU. So 
step 2 will produce an acceptable L. 


EXAMPLE 2 


Find an LU factorization of 

2 4-1 5-2 

-4-5 3-8 1 

A = 2-5-4 1 8 

-6 0 7 -3 1 


SOLUTION Since A has four rows, L should be 4 x 4. The first column of L is the first 
column of A divided by the top pivot entry: 


10 0 0 
-2100 
1 1 0 

-3 1 


Compare the first columns of A and L. The row operations that create zeros in the 
first column of A will also create zeros in the first column of L. To make this same 
correspondence of row operations on A hold for the rest of L, watch a row reduction of 
A to an echelon form U. That is, highlight the entries in each matrix that are used to 
determine the sequence of row operations that transform A into U. [See the highlighted 
entries in equation (5).] 


A 


2 

4 

-1 

5 

-2" 


"2 

4 

-1 

5 

-2" 


-4 

-5 

3 

-8 

1 


0 

3 

1 

2 

-3 


2 

-5 

-4 

1 

8 


0 

-9 

-3 

-4 

10 


-6 

0 

7 

-3 

1 


0 

12 

4 

12 

-5_ 



"2 

4 

-1 

5 

-2" 


"2 

4 

-1 

5 

-2" 

M = 

0 

3 

1 

2-3 


0 

3 

1 

2 

-3 

0 

0 

0 

2 

1 


0 

0 

0 

2 

1 


0 

0 

0 

4 

7 


0 

0 

0 

0 

5 




(5) 


U 
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Permuted LU 
Factorizations 2-23 


The highlighted entries above determine the row reduction of A to U. At each pivot 
column, divide the highlighted entries by the pivot and place the result into L: 



_ 1 




1 

0 

0 

0 " 

-2 

1 


, and L = 

-2 

1 

0 

0 

1 

-3 

1 

1 

-3 

1 

0 

-3 

4 

2 1 


-3 

4 

2 

1 


An easy calculation verifies that this L and U satisfy LU = A. ■ 


In practical work, row interchanges are nearly always needed, because partial piv¬ 
oting is used for high accuracy. (Recall that this procedure selects, among the possible 
choices for a pivot, an entry in the column having the largest absolute value.) To handle 
row interchanges, the LU factorization above can be modified easily to produce an L that 
is permuted lower triangular, in the sense that a rearrangement (called a permutation) 
of the rows of L can make L (unit) lower triangular. The resulting permuted LU 
factorization solves ^4x = b in the same way as before, except that the reduction of 
[ L b ] to [/ y ] follows the order of the pivots in L from left to right, starting with 
the pivot in the first column. A reference to an “LU factorization” usually includes the 
possibility that L might be permuted lower triangular. For details, see the Study Guide. 


|— NUMERICAL NOTES - 

The following operation counts apply to an n x /i dense matrix A (with most 

entries nonzero) for n moderately large, say, n > 30. 1 

1. Computing an LU factorization of A takes about 2n 3 /3 flops (about the same 
as row reducing [ A b ]), whereas finding A~ l requires about 2n 3 flops. 

2. Solving Ly = b and Ux = y requires about In 2 flops, because any n x n 
triangular system can be solved in about n 2 flops. 

3. Multiplication of b by A~ l also requires about 2n 2 flops, but the result may 
not be as accurate as that obtained from L and U (because of roundoff error 
when computing both A~ x and ^4 _1 b). 

4. If A is sparse (with mostly zero entries), then L and U may be sparse, too, 


LU factorization is much faster than using A~ . See Exercise 31. 


WEB 


whereas A~ l is likely to be dense. In this case, a solution of Ax = b with an 


A Matrix Factorization in Electrical Engineering 

Matrix factorization is intimately related to the problem of constructing an electrical 
network with specified properties. The following discussion gives just a glimpse of the 
connection between factorization and circuit design. 


1 See Section 3.8 in Applied Linear Algebra, 3rd ed., by Ben Noble and James W. Daniel (Englewood Cliffs, 

NJ: Prentice-Hall, 1988). Recall that for our purposes, a flop is +, —, x, or -r. 
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Suppose the box in Fig. 3 represents some sort of electric circuit, with an input 


and output. Record the input voltage and current by 


h 


(with voltage v in volts and 


current i in amps), and record the output voltage and current by 


V2 

h 


.Frequently, the 


transformation 
matrix, such that 


h 



is linear. That is, there is a matrix A, called the transfer 


V2 =A 


Vl 

h 


input 

terminals 



electric 

circuit 


h 


V 2 


output 

terminals 


FIGURE 3 A circuit with input and output 
terminals. 


Figure 4 shows a ladder network ， where two circuits (there could be more) are 
connected in series, so that the output of one circuit becomes the input of the next circuit. 
The left circuit in Fig. 4 is called a series circuit ， with resistance R\ (in ohms). 




l 2 l 2 




V 1 


V 2 

< 

卜 

V 3 








A series circuit A shunt circuit 


FIGURE 4 A ladder network. 

The right circuit in Fig. 4 is a shunt circuit ， with resistance R 2 . Using Ohm’s law and 
Kirchhoff’s laws, one can show that the transfer matrices of the series and shunt circuits, 


respectively, are 



and 


1 0 
-l/R 2 1 


Transfer matrix 
of series circuit 


Transfer matrix 
of shunt circuit 


EXAMPLE 3 


a. Compute the transfer matrix of the ladder network in Fig. 4. 

「 1 -8 

b. Design a ladder network whose transfer matrix is c 

— .D J 

SOLUTION 


a. Let A\ and be the transfer matrices of the series and shunt circuits, respectively. 
Then an input vector x is transformed first into A\X and then into A 2 (乂 1 x). The series 
connection of the circuits corresponds to composition of linear transformations, and 
the transfer matrix of the ladder network is (note the order) 


^2^1 = 


'1 0" 

"1 -Ri~ 


'1 

-札 

_-i/r 2 1_ 

0 1 


—l/i?2 

1 + Ri/R 2 _ 


⑹ 
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1 -8 

b. To factor the matrix ^ ^ into the product of transfer matrices, as in equa¬ 

tion ( 6 ), look for R\ and R 2 in Fig. 4 to satisfy 

_ 1 -Ri 

-I/R 2 1 + Ri/Ri 

From the (1,2)-entries, R\ = S ohms, and from the (2,1)-entries, I/R 2 = -5 ohm 
and R 2 = 1/.5 = 2 ohms. With these values, the network in Fig. 4 has the desired 
transfer matrix. ■ 

A network transfer matrix summarizes the input-output behavior (the design spec¬ 
ifications) of the network without reference to the interior circuits. To physically build 
a network with specified properties, an engineer first determines if such a network can 
be constructed (or realized). Then the engineer tries to factor the transfer matrix into 
matrices corresponding to smaller circuits that perhaps are already manufactured and 
ready for assembly. In the common case of alternating current, the entries in the transfer 
matrix are usually rational complex-valued functions. (See Exercises 19 and 20 in 
Section 2.4 and Example 2 in Section 3.3.) A standard problem is to find a minimal 
realization that uses the smallest number of electrical components. 



PRACTICE PROBLEM 


Find an LU factorization of ^4 = 


2-4-2 
6-9-5 
2-7-3 
4-2-2 
-6 3 3 


8 

9 

-1 

4 


[Note: It will turn out that A 


has only three pivot columns, so the method of Example 2 will produce only the first 
three columns of L. The remaining two columns of L come from 1 5 .] 


2.5 EXERCISES 


2. A 


2 — 6 4 
-4 8 0 
0-4 6 


1 0 

-2 1 

0 1 


-2 -1 
0 -1 

2 ^ 
-4 

^ 6 

2 -6 4' 

0-4 8 

0 0-2 


3. A 


In Exercises 1-6, solve the equation An 

= b by using the LU 

"10 0' 

"2 

一 4 

2" 

factorization given for^4. In Exercises 1 and 2, also solve Ax = b ^ = 

-2 1 0 

0 

—3 

6 

by ordinary row reduction. 




3-1 1 

- 

0 

0 

1_ 


"3-7-2" 


"-7" 







1. A = 

-3 5 1 

,b = 

5 


"1 -1 2" 



0" 



6—4 0_ 


2 

4. A = 

1 -3 1 

， b = 

-5 



「 n 

「 



3 7 5 



7 



1 0 0 
1 1 0 
3 -5 1 


1 -1 2 

0 -2 -1 

0 0-6 


5. A 


1 -2 -2 -3 
3-9 0-9 

-12 4 7 

-3 -6 26 2 


,b 








" 1 

0 

0 

0" 


"1 

-2 

-2 

-3" 

2 

-4 

2" 


"6" 

A = 

3 

1 

0 

0 


0 

-3 

6 

0 

-4 

5 

2 

,b = 

0 

-1 

0 

1 

0 


0 

0 

2 

4 

6 

-9 

1 


6 


-3 

4 

-2 

1 


0 

0 

0 

1 
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6 . A 

J U 4 —JO - 

^ 2 

13 2 0 

0 3 0 12 

0 0-2 0 

0 0 0 

Find an LU factorization of the matrices in Exercises 7-16 (with L 
unit lower triangular). Note that MATLAB will usually produce 
a permuted LU factorization because it uses partial pivoting for 
numerical accuracy. 


7. 


3 

1 

2" 


"-5 

0 

4" 

-9 

0 

-4 

10 . 

10 

2 

-5 

9 

9 

14 


10 

10 

16 


"1 

3 

-5 

-3" 





3 

1 5" 

-1 

-5 

8 

4 

14. 

5 

20 

6 31 

4 

2 
一 4 

-5 

7 

-7 

5_ 

-2 


-1 - 

7 

-1 — 

1 

4 

7_ 







- 

2 

-3 

4" 


2 

0 

5 

2" 




-4 

8 

-7 


—6 

3 

-13 

-3 

16. 


6 

-5 

14 


4 

9 

16 

17 




-6 

9 

-12 









8 

—6 

19 



9. 


11 . 


13. 


15. 


17. When A is invertible, MATLAB finds A~ l by factoring 

A = LU (where L may be permuted lower triangular), in¬ 
verting L and U ， and then computing Use this 

method to compute the inverse of A in Exercise 2. (Apply 
the algorithm in Section 2.2 to L and to U.) 

18. Find A~ 1 as in Exercise 17, using A from Exercise 3. 

19. Let ^4 be a lower triangular n x n matrix with nonzero entries 
on the diagonal. Show that A is invertible and A~ l is lower 
triangular. [Hint: Explain why A can be changed into I 
using only row replacements and scaling. (Where are the 
pivots?) Also, explain why the row operations that reduce 
A to I change I into a lower triangular matrix.] 

20. Let A = LU be an LU factorization. Explain why A can be 
row reduced to U using only replacement operations. (This 
fact is the converse of what was proved in the text.) 

21. Suppose A = BC ， where B is invertible. Show that any 
sequence of row operations that reduces B to I also reduces 
A to C. The converse is not true, since the zero matrix may 
be factored as 0 = B - 0. 

Exercises 22-26 provide a glimpse of some widely used matrix 

factorizations, some of which are discussed later in the text. 


22. (Reduced LU Factorization) With A as in the Practice Prob¬ 
lem, find a 5 x 3 matrix B and a 3 x 4 matrix C such that 
A = BC. Generalize this idea to the case where A is m x n, 
A = LU, and U has only three nonzero rows. 

23. (Rank Factorization) Suppose an m x n matrix A admits a 
factorization A = CD where C is m x 4 and D is 4 x 

a. Show that A is the sum of four outer products. (See 
Section 2.4.) 

b. Let m = 400 and/i = 100. Explain why a computer pro¬ 
grammer might prefer to store the data from A in the form 
of two matrices C and D. 

24. (QR Factorization) Suppose A = QR ， where Q and R are 
n x n, R is invertible and upper triangular, and Q has the 
property that Q T Q = /. Show that for each b in the 
equation Ax = b has a unique solution. What computations 
with Q and R will produce the solution? 


25. (Singular Value Decomposition) Suppose A = UDV T , 
where U and V are n x n matrices with the property that 
U T U = I and V T V = /, and where D is a diagonal matrix 
with positive numbers (ii,... ,a n on the diagonal. Show that 
A is invertible, and find a formula for A~ l . 

26. (Spectral Factorization) Suppose a 3 x 3 matrix A admits 
a factorization as ^4 = PDP~ l , where P is some invertible 
3x3 matrix and D is the diagonal matrix 


D 


Show that this factorization is useful when computing high 
powers of A. Find fairly simple formulas for A 2 , A 3 , and A k 
(k a positive integer), using P and the entries in D • 

27. Design two different ladder networks that each output 9 volts 
and 4 amps when the input is 12 volts and 6 amps. 

28. Show that if three shunt circuits (with resistances Ri, R 2 , R 3 ) 
are connected in series, the resulting network has the same 
transfer matrix as a single shunt circuit. Find a formula for 
the resistance in that circuit. 

29. a. Compute the transfer matrix of the network in the figure 

below. 


z i 

— 

*2 ^2 

r --- 

— 

*3 *3 

— 

z 4 

Vl 

尺 1 

v 2 



v 3 

尺 3 

v 4 









b. Let A 


Design a ladder network 


3 -12' 

^ -1/3 5/3^ 

whose transfer matrix is A by finding a suitable matrix 
factorization of A. 


2 0 0 

0 3 0 

0 0 1 


3 

7 

2 


2 

3 

2 

6 

19 

4 

12 . 

4 

13 

9 

-3 

-2 

3 


—6 

5 

4 


8 . 


6 

12 


WEB 


0 2 6 9- 

13 4 0 0 0 


2 4 4 8 0 0 1 


0 3 0 13 4 


12 3 5 
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30. Find a different factorization of the transfer matrix A in 
Exercise 29, and thereby design a different ladder network 
whose transfer matrix is A. 

31. [M] Consider the heat plate in the following figure (refer to 
Exercise 33 in Section 1.1). 


b. Use the LU factorization to solve Ax = b. 

c. Obtain A~ l and note that A~ l is a dense matrix with no 
band structure. When A is large, L and U can be stored 
in much less space than A~ l . This fact is another reason 
for preferring the LU factorization of A to A~ l itself. 


0° 0。 0° 0° 



1 

3 

5 

7 


2 

4 

6 

8 







10° 10。 10° 10° 


20 ° 


20 ° 


32. [M] The band matrix A shown below can be used to estimate 
the unsteady conduction of heat in a rod when the tempera¬ 
tures at points pi,on the rod change with time. 2 * 


Ax Ax Ax Ax Ax 


Pi P 2 P3 Pa 


The solution to the steady-state heat flow problem for this 
plate is approximated by the solution to the equation Ax = b, 
where b = (5,15,0,10,0,10,20, 30) and 


4 -1 -1 
-14 0-1 

-10 4-1 -1 

- 1-140 
-10 4 

-1 -1 
-1 


4 

0 

-1 


0 

4 

-1 


-1 

4 


WEB 

The missing entries in A are zeros. The nonzero entries of 
A lie within a band along the main diagonal. Such band 
matrices occur in a variety of applications and often are 
extremely large (with thousands of rows and columns but 
relatively narrow bands). 

a. Use the method in Example 2 to construct an LU factor¬ 
ization of A, and note that both factors are band matrices 
(with two nonzero diagonals below or above the main 
diagonal). Compute LU — A to check your work. 


The constant C in the matrix depends on the physical nature 
of the rod, the distance Ax between the points on the rod, 
and the length of time At between successive temperature 
measurements. Suppose that for k = 0,1,2,..., a vector 
in M 4 lists the temperatures at time k At • If the two ends of the 
rod are maintained at 0°, then the temperature vectors satisfy 
the equation ^4U+i = t/^ ( 众 = 0, 1， .. .），where 

"(1 +2C) -C 

-C (1 + 2C) -C 

-C (1 +2C) -C 

-C (1 + 2C)_ 

a. Find the LU factorization of A when C = 1. A matrix 
such as A with three nonzero diagonals is called a tridi¬ 
agonal matrix. The L and U factors are bidiagonal 
matrices. 

b. Suppose C = 1 and to = (10,15,15,10) r . Use the LU 
factorization of A to find the temperature distributions ti, 
t 2 , t 3 , and U. 


2 See BiswaN. Datta, Numerical Linear Algebra and Applications (Pacific 

Grove, CA: Brooks/Cole, 1994), pp. 200-201. 


SOLUTION TO PRACTICE PROBLEM 



2 

-4 

-2 

3" 


"2 

-4 

-2 

3" 


6 

-9 

-5 

8 


0 

3 

1 

-1 

A = 

2 

-7 

-3 

9 

〜 

0 

-3 

-1 

6 


4 

-2 

-2 

-1 


0 

6 

2 

-7 


-6 

3 

3 

4 


0 

-9 

-3 

13 


"2 

-4 

-2 

3" 


"2 

-4 

-2 

3" 

0 

3 

1 

-1 


0 

3 

1 

-1 

0 

0 

0 

5 

〜 

0 

0 

0 

5 

0 

0 

0 

-5 


0 

0 

0 

0 

0 

0 

0 

10 


0 

0 

0 

0 


U 


Divide the entries in each highlighted column by the pivot at the top. The resulting 
columns form the first three columns in the lower half of L. This suffices to make row 
reduction of L to / correspond to reduction of A to U. Use the last two columns of Is 
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to make L unit lower triangular. 


2 " 



6 


3" 


2 


-3 


5" 

4 


6 


-5 

-6 


-9 


10 


+2 +3 +5 


一 1 





1 

0 

0 

0 

0 " 

3 

1 




3 

1 

0 

0 

0 

1 

-1 

1 


, L = 

1 

-1 

1 

0 

0 

2 

2 

-1 



2 

2 

-1 

1 

0 

-3 

-3 

2 



-3 

-3 

2 

0 

1 


2.6 THE LEONTIEF INPUT-OUTPUT MODEL 


Linear algebra played an essential role in the Nobel prize-winning work of Wassily 
Leontief, as mentioned at the beginning of Chapter 1. The economic model described 
in this section is the basis for more elaborate models used in many parts of the world. 

Suppose a nation’s economy is divided into 行 sectors that produce goods or services, 
and let x be a production vector in W 1 that lists the output of each sector for one 
year. Also, suppose another part of the economy (called the open sector) does not 
produce goods or services but only consumes them, and let d be a final demand vector 
(or bill of final demands) that lists the values of the goods and services demanded 
from the various sectors by the nonproductive part of the economy. The vector d can 
represent consumer demand, government consumption, surplus production, exports, or 
other external demands. 

As the various sectors produce goods to meet consumer demand, the producers 
themselves create additional intermediate demand for goods they need as inputs for 
their own production. The interrelations between the sectors are very complex, and the 
connection between the final demand and the production is unclear. Leontief asked if 
there is a production level x such that the amounts produced (or “supplied”）will exactly 
balance the total demand for that production, so that 

intermediate 
demand 

The basic assumption of Leontief’s input-output model is that for each sector, there is 
a unit consumption vector in that lists the inputs needed per unit of output of the 
sector. All input and output units are measured in millions of dollars, rather than in 
quantities such as tons or bushels. (Prices of goods and services are held constant.) 

As a simple example, suppose the economy consists of three sectors — manufac¬ 
turing, agriculture, and services—with unit consumption vectors Ci, C 2 , and C 3 , as shown 
in the table that follows. 


amount 
I producedj 



⑴ 
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Inputs Consumed per Unit of Output 

Purchased from: 

Manufacturing 

Agriculture 

Services 

Manufacturing 

.50 

.40 

.20 

Agriculture 

.20 

•30 

.10 

Services 

• 10 

• 10 

.30 


t 

t 

t 


Cl 

c 2 

c 3 


EXAMPLE 1 What amounts will be consumed by the manufacturing sector if it 
decides to produce 100 units? 


SOLUTION Compute 



".50" 


"50" 

100 ci = 100 

.20 

= 

20 


.10 


10 


To produce 100 units, manufacturing will order (i.e., “demand”）and consume 50 units 
from other parts of the manufacturing sector, 20 units from agriculture, and 10 units 
from services. ■ 


If manufacturing decides to produce X\ units of output, then XiCi represents the in¬ 
termediate demands of manufacturing, because the amounts in X\C\ will be consumed in 
the process of creating the X\ units of output. Likewise, if X 2 and X 3 denote the planned 
outputs of the agriculture and services sectors, X 2 C 2 and X 3 C 3 list their corresponding 
intermediate demands. The total intermediate demand from all three sectors is given by 


{intermediate demand} = X 1 C 1 + X 2 C 2 + X 3 C 3 
=Cx 


⑵ 


where C is the consumption matrix [ci C 2 C 3 ], namely, 



".50 

.40 

.20 

C = 

.20 

.30 

.10 


.10 

• 10 

.30 


(3) 


Equations (1) and (2) yield Leontief’s model. 


THE LEONTIEF INPUT-OUTPUT MODEL, OR PRODUCTION EQUATION 

x = C x + d (4) 

Amount Intermediate Final 

produced demand demand 


Equation (4) may also be written as lx — Cx = d, or 


(I - C)x = d (5) 

EXAMPLE 2 Consider the economy whose consumption matrix is given by (3). 
Suppose the final demand is 50 units for manufacturing, 30 units for agriculture, and 
20 units for services. Find the production level x that will satisfy this demand. 
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SOLUTION The coefficient matrix in (5) is 



"1 

0 

0" 


".5 .4 .2" 


•5 

-.4 

-.2" 

c = 

0 

1 

0 

— 

.2 .3 .1 

= 

-.2 

.7 

-.1 


0 

0 

1 


.1 .1 .3 


-.1 

-.1 

.7 


To solve (5)，row reduce the augmented matrix 


.5 

-.4 

-.2 

50" 


5 

-4-2 

500" 


"1 

0 

0 

226" 

-.2 

.7 

-.1 

30 

〜 

-2 

7-1 

300 

〜. .•〜 

0 

1 

0 

119 

-.1 

-.1 

.7 

20 


-1 

-1 7 

200 


0 

0 

1 

78 


The last column is rounded to the nearest whole unit. Manufacturing must produce 
approximately 226 units, agriculture 119 units, and services only 78 units. ■ 

If the matrix / — C is invertible, then we can apply Theorem 5 in Section 2.2, with 
A replaced by (I — C), and from the equation (I — C)x = d obtain x = (I — C) _ 1 d. 
The theorem below shows that in most practical cases, I — C is invertible and the 
production vector x is economically feasible, in the sense that the entries in x are non¬ 
negative. 

In the theorem, the term column sum denotes the sum of the entries in a column 
of a matrix. Under ordinary circumstances, the column sums of a consumption matrix 
are less than 1 because a sector should require less than one unit’s worth of inputs to 
produce one unit of output. 


THEOREM 11 Let C be the consumption matrix for an economy, and let d be the final demand. 

If C and d have nonnegative entries and if each column sum of C is less than 1, 
then (/ — C ) _1 exists and the production vector 

X = (/ -C) _1 d 

has nonnegative entries and is the unique solution of 

x = Cx + d 


The following discussion will suggest why the theorem is true and will lead to a 
new way to compute (/ — C)~ l . 

A Formula for (I - C 广 1 

Imagine that the demand represented by d is presented to the various industries at the 
beginning of the year, and the industries respond by setting their production levels at 
x = d, which will exactly meet the final demand. As the industries prepare to produce d, 
they send out orders for their raw materials and other inputs. This creates an intermediate 
demand of Cd for inputs. 

To meet the additional demand of Cd, the industries will need as additional inputs 
the amounts in C(Cd) = C 2 d. Of course, this creates a second round of intermediate 
demand, and when the industries decide to produce even more to meet this new demand, 
they create a third round of demand, namely, C(C 2 d) = C 3 d. And so it goes. 

Theoretically, this process could continue indefinitely, although in real life it would 
not take place in such a rigid sequence of events. We can diagram this hypothetical 
situation as follows: 
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Demand That 
Must Be Met 

Inputs Needed to 
Meet This Demand 

Final demand 

d 

Cd 

Intermediate demand 



1st round 

Cd 

C(Cd) = C 2 d 

2nd round 

C 2 d 

C(C 2 d) = C 3 d 

3rd round 

C 3 d 

C(C 3 d) = C 4 d 


The production level x that will meet all of this demand is 

x = d + Cd + C 2 d + C 3 d + ••• 

=(/ + C + C 2 + C 3 + ■■■)d (6) 

To make sense of equation (6)，consider the following algebraic identity: 

(I — C)(I +C + C 2 + •■■ + C m ) = I - C m+1 (7) 

It can be shown that if the column sums in C are all strictly less than 1 ， then I — C is in¬ 
vertible, C m approaches the zero matrix as m gets arbitrarily large, and I — C m+1 ^ I . 
(This fact is analogous to the fact that if a positive number t is less than 1， then t m ^ 0 
as m increases.) Using equation (7)，write 

(I - C) 一 1 ^/ + C + C 2 + C 3 + --. + c m 

( 8 ) 

when the column sums of C are less than 1. 


The approximation in (8) means that the right side can be made as close to (/ — C) _1 
as desired by taking m sufficiently large. 

In actual input-output models, powers of the consumption matrix approach the zero 
matrix rather quickly. So (8) really provides a practical way to compute (I — C) _1 . 
Likewise, for any d, the vectors C m d approach the zero vector quickly, and (6) is a 
practical way to solve (/ — C)x = d. If the entries in C and d are nonnegative, then (6) 
shows that the entries in x are nonnegative, too. 

The Economic Importance of Entries in (I - C 广 1 

The entries in (/ — C) -1 are significant because they can be used to predict how the 
production x will have to change when the final demand d changes. In fact, the entries 
in column j of (/ — C) _1 are the increased amounts the various sectors will have to 
produce in order to satisfy an increase of 1 unit in the final demand for output from 
sector j. See Exercise 8. 

i— NUMERICAL NOTE - 

In any applied problem (not just in economics), an equation Ax = b can always 
be written as (I — C)x = b, with C = I — A. If the system is large and sparse 
(with mostly zero entries), it can happen that the column sums of the absolute 
values in C are less than 1. In this case, C m 0. If C m approaches zero 
quickly enough, (6) and (8) will provide practical formulas for solving ^4x = b 
and finding A~ l . 
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PRACTICE PROBLEM 

Suppose an economy has two sectors: goods and services. One unit of output from 
goods requires inputs of .2 unit from goods and .5 unit from services. One unit of 
output from services requires inputs of .4 unit from goods and .3 unit from services. 
There is a final demand of 20 units of goods and 30 units of services. Set up the Leontief 
input-output model for this situation. 


2.6 EXERCISES 



Exercises 1-4 refer to an economy that is divided into three 
sectors — manufacturing, agriculture, and services. For each unit 
of output, manufacturing requires .10 unit from other companies 
in that sector, .30 unit from agriculture, and .30 unit from services. 
For each unit of output, agriculture uses .20 unit of its own output, 
.60 unit from manufacturing, and .10 unit from services. For each 
unit of output, the services sector consumes .10 unit from services, 
.60 unit from manufacturing, but no agricultural products. 

1. Construct the consumption matrix for this economy, and de¬ 
termine what intermediate demands are created if agriculture 
plans to produce 100 units. 


2. Determine the production levels needed to satisfy a final 
demand of 20 units for agriculture, with no final demand for 
the other sectors. (Do not compute an inverse matrix.) 


3. Determine the production levels needed to satisfy a final 
demand of 20 units for manufacturing, with no final demand 
for the other sectors. (Do not compute an inverse matrix.) 


4. Determine the production levels needed to satisfy a final de¬ 
mand of 20 units for manufacturing, 20 units for agriculture, 
and 0 units for services. 


5. Consider the production model x = Cx + d for an economy 
with two sectors, where 

C = 

Use an inverse matrix to determine the production level 
necessary to satisfy the final demand. 


50 

30 


6. Repeat Exercise 5 with C = *( 

7. Let C and d be as in Exercise 5. 


and d : 


16 

12 


a. Determine the production level necessary to satisfy a final 
demand for 1 unit of output from sector 1. 

b. Use an inverse matrix to determine the production level 
necessary to satisfy a final demand of . 
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c. Use the fact that 


51 


50 


1 

30 


30 

+ 

0 


to explain how 


and why the answers to parts (a) and (b) and to Exercise 
5 are related. 


8. Let C be m n x n consumption matrix whose column sums 
are less than 1. Let x be the production vector that satisfies 
a final demand d, and let Ax be a production vector that 
satisfies a different final demand Ad. 

a. Show that if the final demand changes from d to d + Ad, 
then the new production level must be x + Ax. Thus Ax 
gives the amounts by which production must change in 
order to accommodate the change Ad in demand. 

b. Let Ad be the vector in R” with 1 as the first entry and 
0’s elsewhere. Explain why the corresponding production 
Ax is the first column of (/ — C)~ l . This shows that the 
first column of (/ — C)~ 1 gives the amounts the various 
sectors must produce to satisfy an increase of 1 unit in the 
final demand for output from sector 1. 

9. Solve the Leontief production equation for an economy with 
three sectors, given that 



'.2 .2 

.0" 


"40" 

c = 

.3 .1 

.3 

and d = 

60 


.1 .0 

.2 


80 


10. The consumption matrix C for the U.S. economy in 1972 
has the property that every entry in the matrix (/ — C) —1 is 
nonzero (and positive). 1 What does that say about the effect 
of raising the demand for the output of just one sector of the 
economy? 

11. The Leontief production equation, x = Cx + d, is usually 
accompanied by a dual price equation, 

p = C r p + y 

where p is a price vector whose entries list the price per unit 
for each sector’s output, and v is a value added vector whose 
entries list the value added per unit of output. (Value added 
includes wages, profit, depreciation, etc.) An important fact 
in economics is that the gross domestic product (GDP) can 
be expressed in two ways: 

{gross domestic product}= P r d = v r x 

Verify the second equality. [Hint: Compute p r x in two 
ways.] 


12. Let C be a consumption matrix such that C m —0 as 
m — oo, and for m = 1,2,..., let D m = / + C + • • • + 
C m . Find a difference equation that relates D m and _D m +i 
and thereby obtain an iterative procedure for computing for¬ 
mula (8) for (/ — C)~ l . 


13. [M] The consumption matrix C below is based on 
input-output data for the U.S. economy in 1958, with data 
for 81 sectors grouped into 7 larger sectors: (1) nonmetal 
household and personal products, (2) final metal products 
(such as motor vehicles), (3) basic metal products and 
mining, (4) basic nonmetal products and agriculture, (5) 
energy, (6) services, and (7) entertainment and miscellaneous 
products. 2 Find the production levels needed to satisfy the 
final demand d. (Units are in millions of dollars.) 


.1588 

.0057 

.0264 

.3299 

.0089 

.1190 

.0063 


.0064 
.2645 
• 1506 
.0565 
.0081 
.0901 
.0126 


.0025 

.0436 

.3557 

.0495 

.0333 

.0996 

.0196 


.0304 

.0099 

.0139 

.3636 

.0295 

.1260 

.0098 


.0014 

.0083 

•0142 

•0204 

.3412 

.1722 

.0064 


.0083 

.0201 

.0070 

.0483 

.0237 

.2368 

.0132 


• 1594 
.3413 
.0236 
.0649 
.0020 
.3369 
.0012 


74,000 

56,000 

10.500 
25,000 

17.500 
196,000 

5,000 


14. [M] The demand vector in Exercise 13 is reasonable for 
1958 data, but Leontief’s discussion of the economy in the 
reference cited there used a demand vector closer to 1964 
data: 


d = (99640,75548,14444, 33501,23527,263985,6526) 
Find the production levels needed to satisfy this demand. 


15. [M] Use equation (6) to solve the problem in Exercise 13. Set 
又⑼ =d, and fork = 1,2,..compute x ⑻ =d + Cx^ _1 \ 
How many steps are needed to obtain the answer in Exer¬ 
cise 13 to four significant figures? 


1 Wassily W. Leontief, “The World Economy of the Year 2000,” Scientific 
American, September 1980, pp. 206-231. 


2 Wassily W. Leontief, “The Structure of the U.S. Economy,” Scientific 
American, April 1965, pp. 30-32. 
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SOLUTION TO PRACTICE PROBLEM 


The following data are given: 



Inputs Needed per Unit of Output 


Purchased from: 

Goods 

Services 

External Demand 

Goods 

.2 

.4 

20 

Services 

.5 

.3 

30 


The Leontief input-output model is x = Cx + d, where 


.2 .4 

5 .3 


d = 


20 

30 


2.7 APPLICATIONS TO COMPUTER GRAPHICS 


6 5 



FIGURE 1 

Regular N. 


Computer graphics are images displayed or animated on a computer screen. Applica¬ 
tions of computer graphics are widespread and growing rapidly. For instance, computer- 
aided design (CAD) is an integral part of many engineering processes, such as the 
aircraft design process described in the chapter introduction. The entertainment industry 
has made the most spectacular use of computer graphics—from the special effects in The 
Matrix to PlayStation 2 and the Xbox. 

Most interactive computer software for business and industry makes use of com¬ 
puter graphics in the screen displays and for other functions, such as graphical display 
of data, desktop publishing, and slide production for commercial and educational pre¬ 
sentations. Consequently, anyone studying a computer language invariably spends time 
learning how to use at least two-dimensional (2D) graphics. 

This section examines some of the basic mathematics used to manipulate and dis¬ 
play graphical images such as a wire-frame model of an airplane. Such an image (or 
picture) consists of a number of points, connecting lines or curves, and information 
about how to fill in closed regions bounded by the lines and curves. Often, curved lines 
are approximated by short straight-line segments, and a figure is defined mathematically 
by a list of points. 

Among the simplest 2D graphics symbols are letters used for labels on the screen. 
Some letters are stored as wire-frame objects; others that have curved portions are stored 
with additional mathematical formulas for the curves. 

EXAMPLE 1 The capital letter N in Fig. 1 is determined by eight points, or vertices. 


The coordinates of the points can be stored in a data matrix, D • 




Vertex: 


1 2 

3 

4 5 6 7 

x-coordinate 

'0 .5 

•5 

6 6 5.5 5.5 

-coordinate 

0 0 

6.42 

0 8 8 1.58 


In addition to D, it is necessary to specify which vertices are connected by lines, but we 
omit this detail. ■ 

The main reason graphical objects are described by collections of straight-line seg¬ 
ments is that the standard transformations in computer graphics map line segments onto 
other line segments. (For instance, see Exercise 26 in Section 1.8.) Once the vertices 
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FIGURE 2 

Slanted N. 



Composite transformation of N • 


that describe an object have been transformed, their images can be connected with the 
appropriate straight lines to produce the complete image of the original object. 


1 .25 

0 1 

- Ax on the letter N in Example 1. 


EXAMPLE 2 Given A 


tion x I 


,describe the effect of the shear transforma- 


SOLUTION By definition of matrix multiplication, the columns of the product AD 
contain the images of the vertices of the letter N. 


AD 


2 

.5 

0 


3 

2.105 

6.420 


6 

7.5 

8 


7 

5.895 

1.580 


The transformed vertices are plotted in Fig. 2, along with connecting line segments that 
correspond to those in the original figure. ■ 

The italic N in Fig. 2 looks a bit too wide. To compensate, shrink the width by a 
scale transformation that affects the x-coordinates of the points. 

EXAMPLE 3 Compute the matrix of the transformation that performs a shear trans¬ 
formation, as in Example 2, and then scales all x-coordinates by a factor of .75. 

SOLUTION The matrix that multiplies the x-coordinate of a point by .75 is 

•75 O' 

0 1 

So the matrix of the composite transformation is 


S 


SA 


.75 

0 

.75 

0 


0 


0 


.25 

1 


• 1875 

1 


The result of this composite transformation is shown in Fig. 3. 


■ 


The mathematics of computer graphics is intimately connected with matrix multi¬ 
plication. Unfortunately, translating an object on a screen does not correspond directly 
to matrix multiplication because translation is not a linear transformation. The standard 
way to avoid this difficulty is to introduce what are called homogeneous coordinates. 


文 2 


4 - 

.▲ 

t 


A 2 ： 

1 


A 



-4 -2 

2 

4 


Translation by 


Homogeneous Coordinates 

Each point (x, y) in R 2 can be identified with the point (x, y, 1) on the plane in R 3 
that lies one unit above the xj-plane. We say that (x, y) has homogeneous coordinates 
(x, y, 1). For instance, the point (0,0) has homogeneous coordinates (0,0,1). Homo¬ 
geneous coordinates for points are not added or multiplied by scalars, but they can be 
transformed via multiplication by 3 x 3 matrices. 

EXAMPLE 4 A translation of the form (x, y) (x -h h, y -h k) is written in ho¬ 
mogeneous coordinates as (x, y, 1) i-> (x h,y + k, 1). This transformation can be 
computed via matrix multiplication: 


■ 


"1 0 

h~ 

X 


x + h 

0 1 

k 


= 

y + k 

0 0 

1 

1 


1 
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Original Figure 


After Scaling 


After Rotating 


After Translating 


EXAMPLE 5 Any linear transformation on R 2 is represented with respect to homo¬ 


geneous coordinates by a partitioned matrix of the form 
matrix. Typical examples are 


A 

0 


where ^4 is a 2 x 2 


COS (p 

— sin q) 

0 " 


"0 

1 

0 " 


s 

0 

o ' 

sin^? 

COS (p 

0 


1 

0 

0 


0 

t 

0 

0 

0 

1 


0 

0 

1 


_0 

0 

1 


Counterclockwise Reflection Scale x by 

rotation about the through y = x and y by t 

origin, angle (p 


■ 


Composite Transformations 

The movement of a figure on a computer screen often requires two or more basic trans¬ 
formations. The composition of such transformations corresponds to matrix multiplica¬ 
tion when homogeneous coordinates are used. 


EXAMPLE 6 Find the 3 x 3 matrix that corresponds to the composite transforma¬ 
tion of a scaling by .3, a rotation of 90° about the origin, and finally a translation that 
adds (—.5,2) to each point of a figure. 


SOLUTION \f (p = 7r/2, then sin^ = 1 and cosp = 0. From Examples 4 and 5, we 
have 


X 

Scale 

".3 

0 

0" 

X 

y 

- > 

0 

.3 

0 

y 

1 


0 

0 

1 

l 


Rotate 
- >• 


Translate 
- >• 


0 

-1 

0" 

".3 

0 

0" 

X 




l 

0 

0 

0 

.3 

0 

J 




0 

0 

1 

_ 0 

0 

1 

1 




l 

0 

-.5" 

"0 

-1 

0 一 

一 .3 

0 

0 一 

X 

0 

1 

2 

1 

0 

0 

0 

.3 

0 

y 

0 

0 

1 

0 

0 

1 

0 

0 

1 

l 


The matrix for the composite transformation is 


"l 

0 

-.5" 

"0 

-1 

0 一 

一 .3 

0 

O' 

0 

1 

2 

1 

0 

0 

0 

.3 

0 

0 

0 

1 

0 

0 

1 

0 

0 

1 


"0 

-1 

-.5" 

一 .3 

0 

0" 


" 0 

— .3 

-.5" 

1 

0 

2 

0 

.3 

0 

= 

•3 

0 

2 

0 

0 

1 

0 

0 

1 


0 

0 

1 


3D Computer Graphics 

Some of the newest and most exciting work in computer graphics is connected with 
molecular modeling. With 3D (three-dimensional) graphics, a biologist can examine a 
simulated protein molecule and search for active sites that might accept a drug molecule. 
The biologist can rotate and translate an experimental drug and attempt to attach it to the 
protein. This ability to visualize potential chemical reactions is vital to modern drug and 
cancer research. In fact, advances in drug design depend to some extent upon progress 
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in the ability of computer graphics to construct realistic simulations of molecules and 
their interactions. 1 

Current research in molecular modeling is focused on virtual reality, an environ¬ 
ment in which a researcher can see and feel the drug molecule slide into the protein. In 
Fig. 4, such tactile feedback is provided by a force-displaying remote manipulator. 



FIGURE 4 Molecular modeling in virtual reality. 

(Computer Science Department, University of 
North Carolina at Chapel Hill. Photo by Bo 
Strain.) 

Another design for virtual reality involves a helmet and glove that detect head, hand, and 
finger movements. The helmet contains two tiny computer screens, one for each eye. 
Making this virtual environment more realistic is a challenge to engineers, scientists, 
and mathematicians. The mathematics we examine here barely opens the door to this 
interesting field of research. 


Homogeneous 3D Coordinates 


By analogy with the 2D case, we say that (x, y,z, 1) are homogeneous coordinates for 
the point (x, y, z) in R 3 . In general, (X, Y, Z, H) are homogeneous coordinates for 


(x,y,z) if H ^ 0 and 


x = 


X 

n 


Y 

H' 


and z 


Z 

H 


⑴ 


Each nonzero scalar multiple of (x, y,z, 1) gives a set of homogeneous coordinates for 
(x, y, z). For instance, both (10, —6,14,2) and (—15, 9, —21, —3) are homogeneous 
coordinates for (5, —3, 7). 

The next example illustrates the transformations used in molecular modeling to 
move a drug into a protein molecule. 


EXAMPLE 7 Give 4 x 4 matrices for the following transformations : 

a. Rotation about the y-axis through an angle of 30°. (By convention, a positive angle 
is the counterclockwise direction when looking toward the origin from the positive 
half of the axis of rotation—in this case, the j-axis.) 


1 Robert Pool, “Computing in Science,” Science 256, 3 April 1992, p. 45. 
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b. Translation by the vector p = (—6, 4, 5). 



SOLUTION 

a. First, construct the 3x3 matrix for the rotation. The vector ei rotates down toward 
the negative z-axis，stopping at (cos 30°, 0, — sin 30°) = (V3/2,0, —.5). The vector 
e 2 on the _y-axis does not move, but e 3 on the z-axis rotates down toward the positive 
x-axis, stopping at (sin 30 。， 0, cos 30。）= (.5,0, \/3/2). See Fig. 5. From Section 
1.9, the standard matrix for this rotation is 

"V3/2 0 .5 

0 1 0 

-.5 0 V3/2_ 

So the rotation matrix for homogeneous coordinates is 


FIGURE 5 

"V3/2 

0 

.5 

0" 

A = 

0 

1 

0 

0 

— .5 

0 

V3/2 

0 


0 

0 

0 

1 _ 


b. We want (x, y,z, 1) to map to (x _ 6, j + 4, z + 5, 1). The matrix that does this is 


10 0-6 
0 10 4 

0 0 15 

0 0 0 1 


■ 


Perspective Projections 

A three-dimensional object is represented on the two-dimensional computer screen by 
projecting the object onto a viewing plane. (We ignore other important steps, such as 
selecting the portion of the viewing plane to display on the screen.) For simplicity, let 
the -plane represent the computer screen, and imagine that the eye of a viewer is 
along the positive z-axis, at a point (0,0, d). A perspective projection maps each point 
(x, y, z) onto an image point (x*, y*,0) so that the two points and the eye position, 
called the center of projection, are on a line. See Fig. 6(a). 





⑻ ⑹ 
FIGURE 6 Perspective projection of (x, y, z) onto (x*, 0). 
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S under the perspective 
transformation. 


WEB 


This text s web site has some interesting applications or computer graphics, includ¬ 
ing a further discussion of perspective projections. One of the computer projects on the 
web site involves simple animation. 


To obtain R 3 coordinates, use equation (1) before Example 7, and divide the top three 
entries in each column by the corresponding entry in the fourth row: 


The triangle in the xz-plane in Fig. 6(a) is redrawn in part (b) showing the lengths 
of line segments. Similar triangles show that 


: T 

~d 


d - z 


and x* 


dx 


d — z 1 _ z,j d 


Similarly, 


y 


\ — z/d 


Using homogeneous coordinates, we can represent the perspective projection by a ma- 


trix, say, P. We want (x, y, z, 1) to map into 


r , 0,1 I. Scaling these 


1 — z/d 1 — z/d 

coordinates by 1 — z/d, we can also use (x, y, 0, 1 — z/d) as homogeneous coordinates 
for the image. Now it is easy to display P. In fact, 


P 


X 


" 1 0 0 0" 


X 


X 

y 


0 10 0 


y 


y 

z 


0 0 0 0 


z 


0 

1 


0 0 -l/d 1 


1 


1 - z/d 


EXAMPLE 8 Let S be the box with vertices (3,1,5), (5,1,5), (5,0,5), (3,0,5), 
(3,1,4), (5,1,4), (5,0,4), and (3,0,4). Find the image of S under the perspective pro¬ 
jection with center of projection at (0,0,10). 

SOLUTION Let P be the projection matrix, and let D be the data matrix for S using 
homogeneous coordinates. The data matrix for the image of S is 


Vertex: 


PD 








1 

2 

3 

4 

5 

6 

7 

8 

1 

0 

0 


0" 


"3 

5 

5 

3 

3 

5 

5 

3" 

0 

1 

0 


0 


1 

1 

0 

0 

1 

1 

0 

0 

0 

0 

0 


0 


5 

5 

5 

5 

4 

4 

4 

4 

0 

0 

-1/10 


1 


1 

1 

1 

1 

1 

1 

1 

1 

3 

5 

5 

3 

3 

5 

5 

3" 






1 

1 

0 

0 

1 

1 

0 

0 






0 

0 

0 

0 

0 

0 

0 

0 






•5 

.5 

.5 

•5 

.6 

•6 

•6 

.6 







_ 


■ 


8 5 0 0 


7 8 . 30 0 


3 7 
8 . 1 . 

ex:5 51.7 0 

rt 

ve 

4 6 0 0 
3 100 0 
2 102 0 
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厂 NUMERICAL NOTE - 

Continuous movement of graphical 3D objects requires intensive computation 
with 4x4 matrices, particularly when the surfaces are rendered to appear 
realistic, with texture and appropriate lighting. High-end computer graphics 
boards have 4x4 matrix operations and graphics algorithms embedded in their 
microchips and circuitry. Such boards can perform the billions of matrix multipli¬ 
cations per second needed for realistic color animation in 3D gaming programs. 2 


Further Reading 

James D. Foley, Andries van Dam, Steven K. Feiner, and John F. Hughes, Computer 
Graphics: Principles and Practice, 3rd ed. (Boston, MA: Addison-Wesley, 2002), 
Chapters 5 and 6. 

PRACTICE PROBLEM 

Rotation of a figure about a point p in R 2 is accomplished by first translating the figure 
by —p, rotating about the origin, and then translating back by p. See Fig. 7. Construct 
the 3x3 matrix that rotates points —30° about the point (—2, 6), using homogeneous 
coordinates. 



(a) Original figure. (b) Translated to 

origin by -p. 

FIGURE 7 Rotation of figure about point p. 




•P 


Op 


-^1 


Xl 


(c) Rotated about 
the origin. 


(d) Translated 
back by p. 


2.7 EXERCISES 


1. What 3x3 matrix will have the same effect on homogeneous 
coordinates for R 2 that the shear matrix A has in Example 2? 

2. Use matrix multiplication to find the image of the triangle 

厂 4 2 51 

with data matrix D = ^ ^ ^ under the transforma¬ 

tion that reflects points through the j-axis. Sketch both the 
original triangle and its image. 

In Exercises 3-8, find the 3x3 matrices that produce the de¬ 
scribed composite 2D transformations, using homogeneous coor¬ 
dinates. 

3. Translate by (2,1), and then rotate 90° about the origin. 

4. Translate by (—1,4), and then scale the x-coordinate by 1/2 
and the j-coordinate by 3/2. 


5. Reflect points through the x-axis, and then rotate 45° about 
the origin. 

6. Rotate points 45° about the origin, then reflect through the 

X-2LX1S. 

7. Rotate points through 60° about the point (6, 8). 

8. Rotate points through 45° about the point (3,7). 

9. A 2 x 100 data matrix D contains the coordinates of 100 
points. Compute the number of multiplications required to 
transform these points using two arbitrary 2x2 matrices A 
and B. Consider the two possibilities A(BD) and (AB)D. 
Discuss the implications of your results for computer graph¬ 
ics calculations. 


2 See Jan Ozer, “High-Performance Graphics Boards/ 5 PC Magazine 19, 1 September 2000, pp. 187-200. 
Also, “The Ultimate Upgrade Guide: Moving On Up,” PC Magazine 21, 29 January 2002, pp. 82-91. 
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10. Consider the following geometric 2D transformations: D, a 
dilation (in which x-coordinates and y-coordinates are scaled 
by the same factor); R, a rotation; and T, a translation. Does 
D commute with R1 That is, is D(R(x)) = R(D(x)) for all 
x in M. 2 ? Does D commute with T1 Does R commute with 
T1 


11. A rotation on a computer screen is sometimes implemented 
as the product of two shear-and-scale transformations, which 
can speed up calculations that determine how a graphic image 
actually appears in terms of screen pixels. (The screen con¬ 
sists of rows and columns of small dots, called pixels.) The 
first transformation A\ shears vertically and then compresses 
each column of pixels; the second transformation A 2 shears 
horizontally and then stretches each row of pixels. Let 


1 0 0 

sin cp cos p 0 

0 0 1 


^2 : 


sec(p 

0 

0 


- idxup 

1 

0 


17. Give the 4x4 matrix that rotates points in M 3 about the x- 
axis through an angle of 60°. (See the figure.) 



18. Give the 4x4 matrix that rotates points in R 3 about the 
Z-axis through an angle of —30°, and then translates by 
p = (5,-2,l). 

19. Let S be the triangle with vertices (4.2,1.2,4), (6,4,2), and 
(2,2,6). Find the image of S under the perspective projection 
with center of projection at (0,0,10). 


and show that the composition of the two transformations is 
a rotation in R 2 . 



12. A rotation in R 2 usually requires four multiplications. Com¬ 
pute the product below, and show that the matrix for a rota¬ 
tion can be factored into three shear transformations (each of 
which requires only one multiplication). 


_ 1 

— tan 妒 / 2 

0" 


1 

0 

0" 

0 

1 

0 


sinp 

1 

0 

0 

0 

1 


0 

0 

1 


1 — tan (p/2 0 

0 1 0 

0 0 1 


13. The usual transformations on homogeneous coordinates for 
2D computer graphics involve 3x3 matrices of the form 

n 

1 where ^4 is a 2 x 2 matrix and p is in IR 2 . Show 

that such a transformation amounts to a linear transformation 
on R 2 followed by a translation. [Hint: Find an appropriate 
matrix factorization involving partitioned matrices.] 

14. Show that the transformation in Exercise 7 is equivalent to 
a rotation about the origin followed by a translation by p. 
Find p. 

15. What vector in R 3 has homogeneous coordinates 
(i _I _I 丄）？ 

16. Are (1, —2, —3,4) and (10, —20, —30,40) homogeneous co¬ 
ordinates for the same point in R 3 ? Why or why not? 


20. Let S be the triangle with vertices (7, 3, —5), (12,8,2), and 
(1,2,1). Find the image of S under the perspective projection 
with center of projection at (0,0,10). 

Exercises 21 and 22 concern the way in which color is specified 
for display in computer graphics. A color on a computer screen 
is encoded by three numbers (R, G, B) that list the amounts of 
energy an electron gun must transmit to red, green, and blue 
phosphor dots on the computer screen. (A fourth number specifies 
the luminance or intensity of the color.) 

21. [M] The actual color a viewer sees on a screen is influenced 
by the specific type and amount of phosphors on the screen. 
So each computer screen manufacturer must convert between 
the (R, G, B) data and an international CEE standard for 
color, which uses three primary colors, called X, Y, and Z. 
A typical conversion for short-persistence phosphors is 


".61 

.29 

.150" 

~ R~ 


'X~ 

.35 

.59 

.063 

G 

二 

Y 

.04 

.12 

.787 

B 


Z 


A computer program will send a stream of color information 
to the screen, using standard CIE data (X, Y, Z). Find the 
equation that converts these data to the (R, G, B) data needed 
for the screen’s electron gun. 

22. [M] The signal broadcast by commercial television describes 
each color by a vector (Y, /, Q). If the screen is black and 
white, only the F-coordinate is used. (This gives a better 
monochrome picture than using CIE data for colors.) The 
correspondence between YIQ and a “standard” RGB color is 
given by 


Y~ 


~ .299 

.587 

.114" 

"R 

I 

= 

.596 

-.275 

-.321 

G 

Q_ 


.212 

-.528 

.311 

B 
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(A screen manufacturer would change the matrix entries to the YIQ data transmitted by the television station to the RGB 

work for its RGB screens.) Find the equation that converts data needed for the television screen. 


SOLUTION TO PRACTICE PROBLEM 


Assemble the matrices right-to-left for the three operations. Using p = (—2,6 )， 
cos(—30°) = V3/2, and sin(—30°) = —.5, we have 


Translate 


Rotate around 


Translate 

back by p 


the origin 



by-/? 

"1 0-2 


" 73/2 

1/2 

0 一 

_ 1 

0 2 " 

0 1 6 


- 1/2 

V 3/2 

0 

0 

1 -6 

0 0 1 


_ 0 

0 

1 _ 

_0 

0 1 



73/2 

1/2 


■5 _ 


= 


— 1/2 

V 3/2 — 

3 V 3 + 5 




0 

0 

1 




2.8 SUBSPACES OF R n 


This section focuses on important sets of vectors in called subspaces. Often sub¬ 
spaces arise in connection with some matrix A, and they provide useful information 
about the equation Ax = b. The concepts and terminology in this section will be used 
repeatedly throughout the rest of the book. 1 


DEFINITION 


X 3 



FIGURE 1 

Span {vi, V 2 } as a plane through 
the origin. 


A subspace of W 1 is any set H in W 1 that has three properties: 

a. The zero vector is in H. 

b. For each u and v in H ， the sum u + v is in //. 

c. For each uin H and each scalar c, the vector cu is in //. 


In words, a subspace is closed under addition and scalar multiplication. As you will 
see in the next few examples, most sets of vectors discussed in Chapter 1 are subspaces. 
For instance, a plane through the origin is the standard way to visualize the subspace in 
Example 1. See Fig. 1. 

EXAMPLE 1 Ifvi and y 2 are in and H = Span{vi, V 2 }，then // is a subspace 
of W\ To verify this statement, note that the zero vector is in H (because Ovi + 0v2 is 
a linear combination of Vi and V 2 ). Now take two arbitrary vectors in H, say, 


u = ^iVi + S 2\2 and v = t\\\ + 


Then 

U + V = Oi + ^l)Vl + (s 2 + t 2 )\2 

which shows that u + v is a linear combination of Vi and \2 and hence is in H. Also, for 
any scalar c, the vector cu is in H, because cu = c^iVi + S 2 \ 2 ) = (c^Ovi + (cs 2 )\ 2 - 


1 Sections 2.8 and 2.9 are included here to permit readers to postpone the study of most or all of the next two 
chapters and to skip directly to Chapter 5, if so desired. Omit these two sections if you plan to work through 
Chapter 4 before beginning Chapter 5. 
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If Vi is not zero and if \2 is a multiple of Vi, then Vi and \2 simply span a line 
through the origin. So a line through the origin is another example of a subspace. 


X 2 



V 2 / 




Vi ^0,v 2 = ky { . 


EXAMPLE 2 A line L not through the origin is not a subspace, because it does not 
contain the origin, as required. Also, Fig. 2 shows that L is not closed under addition 
or scalar multiplication. ■ 



u + v is not on L 

FIGURE 2 



EXAMPLE 3 Forv 1? ... ,\ p in the set of all linear combinations of Vi,..., 
is a subspace of The verification of this statement is similar to the argument given 
in Example 1. We shall now refer to Span{vi,... as the subspace spanned (or 
generated) by Vi,... ■ 

Note that M. n is a subspace of itself because it has the three properties required for a 
subspace. Another special subspace is the set consisting of only the zero vector in R w . 
This set, called the zero subspace, also satisfies the conditions for a subspace. 


Column Space and Null Space of a Matrix 

Subspaces of usually occur in applications and theory in one of two ways. In both 
cases, the subspace can be related to a matrix. 


The column space of a matrix A is the set Col A of all linear combinations of the 
columns of A. 



If A = [si\ ••• a, 7 ], with the columns in R m , then Col^4 is the same as 
Span {a i,..., a„}. Example 4 shows that the column space of srn m x n matrix is a 
subspace of Note that Col A equals only when the columns of A span R m . 
Otherwise, Col A is only part of R m . 

'1-3-4 
EXAMPLE 4 LeM= -4 6-2 

_-3 7 6 

in the column space of A. 

SOLUTION The vector b is a linear combination of the columns of A if and only if 
b can be written as Ax for some x, that is, if and only if the equation Ax = b has a 
solution. Row reducing the augmented matrix [ A b ], 


1 

-3 

-4 

3" 


"1 

-3 

-4 3" 


"1 

-3 

-4 

3" 

-4 

6 

-2 

3 

〜 

0 

-6 

-18 15 

〜 

0 

-6 

-18 

15 

-3 

7 

6 

-4 


0 

-2 

-6 5 


0 

0 

0 

0 


we conclude that ylx = b is consistent and b is in Col A. ■ 


and b = 3 . Determine whether b is 

-4 
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The solution of Example 4 shows that when a system of linear equations is written 
in the form Ax = b, the column space of A is the set of all b for which the system has 
a solution. 

The null space of a matrix A is the set Nul A of all solutions of the homogeneous 
equation Ax = 0. 


When A has n columns, the solutions of Ax = 0 belong to R w , and the null space 
of ^4 is a subset of W 1 . In fact, Nul A has the properties of a subspace of R”. 


THEOREM 12 The null space of an m x /i matrix ^4 is a subspace of R n . Equivalently, the 
set of all solutions of a system Ax = 0 of m homogeneous linear equations in 
n unknowns is a subspace of W 1 . 


PROOF The zero vector is in Nul A (because ylO = 0). To show that Nul A satisfies the 
other two properties required fora subspace, take any u and y in Nul A. That is, suppose 
An = 0 and Ay = 0. Then, by a property of matrix multiplication, 

4(u + v) = Au+ >lv = 0 + 0 = 0 

Thus u + v satisfies Ax = 0, and so u + v is in Nul ^4. Also, for any scalar c, A(cu)= 
c(Au) = c(0) = 0, which shows that cu is in Nul A. ■ 

To test whether a given vector v is in Nul A, just compute A\ to see whether Ay is 
the zero vector. Because Nul A is described by a condition that must be checked for each 
vector, we say that the null space is defined implicitly. In contrast, the column space is 
defined explicitly, because vectors in Col A can be constructed (by linear combinations) 
from the columns of A. To create an explicit description of Nul A, solve the equation 
Ax = 0 and write the solution in parametric vector form. (See Example 6, below.) 2 

Basis for a Subspace 

Because a subspace typically contains an infinite number of vectors, some problems 
involving a subspace are handled best by working with a small finite set of vectors that 
span the subspace. The smaller the set, the better. It can be shown that the smallest 
possible spanning set must be linearly independent. 


DEFINITION 

e 3 



e 2 


^2 


FIGURE 3 

The standard basis for R 3 . 


A basis for a subspace H of W 1 is a linearly independent set in H that spans H. 

EXAMPLE 5 The columns of an invertible n x n matrix form a basis for all of 
because they are linearly independent and span R n , by the Invertible Matrix Theorem. 
One such matrix is the n x n identity matrix. Its columns are denoted by e! ， • • • ， e„: 



"1" 

0 


"0" 

1 


"0" 

ei = 

_ 0 _ 

, e 2 = 

_ 0 _ 

， • • • ， = 

0 

_ 1 _ 


The set {ei,..., e /2 } is called the standard basis for R n . See Fig. 3. ■ 


2 The contrast between Nul A and Col A is discussed further in Section 4.2. 
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The next example shows that the standard procedure for writing the solution set of 
Ax = 0 in parametric vector form actually identifies a basis for Nul A. This fact will be 
used throughout Chapter 5. 


EXAMPLE 6 Find a basis for the null space of the matrix 

_-3 6-1 1 -7" 

A = 1 -2 2 3 -1 

2 -4 5 8 -4_ 

SOLUTION First, write the solution of Ax = 0 in parametric vector form: 

'1 -2 0-1 3 0 

[X 0] 〜 0 0 1 2-2 0 

0 0 0 0 0 0 

The general solution is Xi = 2 x 2 + 又 4 — 3xs, X 3 = — 2 x 4 + 2 ^ 5 , with X 2 , X 4 , and X 5 

free. 


X\ _ ZX2 _ X4 + 3^5 == u 
X3 + 2X4 — 2^5 = 0 
0 = 0 


Xl _ 


2X2 + ^4 — 3X5 


2 


1 


-3 



又 2 


1 


0 


0 


= 

— 2^4 -J- 2^5 

= X 2 

0 

+ X4 

-2 

+ x 5 

2 

X4 


X4 


0 


1 


0 

X5_ 


x 5 


0 


0 


1 


u y w 

= X2U + X 4 \ + X5W (1) 

Equation (1) shows that Nul A coincides with the set of all linear combinations of u, 
y, and w. That is, {u, v, w} generates Nul A In fact, this construction of u, y, and w 
automatically makes them linearly independent, because equation ( 1 ) shows that 0 = 
X 2 U + X 4 V + X 5 W only if the weights X 2 , X 4 , and X 5 are all zero. (Examine entries 2, 4, 
and 5 in the vector X 2 U + X 4 V + X 5 W.) So {u, v, w} is a basis for Nul A. ■ 


Finding a basis for the column space of a matrix is actually less work than finding 
a basis for the null space. However, the method requires some explanation. Let’s begin 
with a simple case. 


EXAMPLE 7 Find a basis for the column space of the matrix 

"1 0 -3 5 0 " 

0 1 2-1 0 

€= 0 0 0 0 1 

_0 0 0 0 0 _ 

SOLUTION Denote the columns of 5 by b 1 ,... ,b 5 and note that b 3 = —3bi + 2 b 2 and 
b 4 = 5b 1 — b〗.The fact that b 3 and b 4 are combinations of the pivot columns means that 
any combination of bi,..., bs is actually just a combination of b], b 2 , and bs ， Indeed, 
if y is any vector in Col B, say, 

v = cibi + c 2 b 2 + c 3 b 3 + c 4 b 4 + c 5 b 5 
then, substituting for b 3 and b 4 , we can write v in the form 

v = qbi + Cj }^2 + ^ 3 (—3bi + 2 b〗）+ C 4 ( 5 bi — b〗）+ csbs 

which is a linear combination of bi, b〗，and bs ， So {bi, b 2 , b^} spans Col B. Also, bi, 
b 2 , and bs are linearly independent, because they are columns from an identity matrix. 
So the pivot columns of B form a basis for Col B. ■ 

















150 CHAPTER 2 Matrix Algebra 


The matrix B in Example 7 is in reduced echelon form. To handle a general matrix 
A, recall that linear dependence relations among the columns of A can be expressed 
in the form Ax = 0 for some x. (If some columns are not involved in a particular 
dependence relation, then the corresponding entries in x are zero.) When A is row 
reduced to echelon form B, the columns are drastically changed, but the equations 
Ax = 0 and Bx = 0 have the same set of solutions. That is, the columns of A have 
exactly the same linear dependence relationships as the columns of B. 


EXAMPLE 8 It can be verified that the matrix 


A = [^i 


a 2 




a 5 ]= 


1 

-2 

2 

3 


3 3 

-2 2 
3 0 

4-1 


2-9 
-8 2 
7 1 

11 -8 


is row equivalent to the matrix B in Example 7. Find a basis for Col A. 

SOLUTION From Example 7, the pivot columns of A are columns 1, 2, and 5. 
Also, b 3 = —3bi + 2b2 and b 4 = 5b\ — b 2 . Since row operations do not affect linear 
dependence relations among the columns of the matrix, we should have 


a 3 = —3ai + 2 a 2 and a 4 = 5ai — a 2 


Check that this is true! By the argument in Example 7, a 〗 and a 4 are not needed to 
generate the column space of A. Also, {ai, a 2 , as} must be linearly independent, because 
any dependence relation among a! ， a 2 , and as would imply the same dependence relation 
among bi, b〗，and bs. Since {biUs} is linearly independent, {ai ， a 2 ,as} is also 
linearly independent and hence is a basis for Col A ■ 

The argument in Example 8 can be adapted to prove the following theorem. 


THEOREM 13 The pivot columns of a matrix A form a basis for the column space of A. 


Warning: Be careful to use pivot columns of A itself for the basis of Col ^4. The 
columns of an echelon form B are often not in the column space of A. (For instance, 
in Examples 7 and 8, the columns of B all have zeros in their last entries and cannot 
generate the columns of A.) 


Mastering: Subspace, 

SG CoM, NuM, Basis 2-37 


PRACTICE PROBLEMS 


Let A = 

1 

2 

-1 

0 

5" 

7 

and u = 

"-7" 

3 


-3 

-5 

-3 


2 


each answer. 


Is u in NuM? Is u in CoM? Justify 


0 1 

2. Given A= 0 0 

0 0 


0 

1 , find a vector in Nul A and a vector in Col ^4. 

0 


3. Suppose an n x n matrix A is invertible. What can you say about Col A? About 
NulAl 
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2 


.Determine if u is in the subspace of R 4 generated 


by {vi,V 2 ,v 3 }. 


6. Let Vi = 


19. 


-6 


.Determine if p is in Col A, where A 


and p 

■ 17 - 
[vi v 2 V3 ]• 

9. With A and p as in Exercise 7, determine if p is in Nul A. 
~-5~ 

10. With u = 5 and A as in Exercise 8 , determine if u is 

3 

in NuM. 

In Exercises 11 and 12, give integers p and q such that Nul A is a 
subspace of and Col ^4 is a subspace of M. q . 


11. A 


12. A 


2 1 
-4 1 

2 -5 


2 

5 

-1 

7 

3 


3' 

7 

0 

11 

4 


13. For A as in Exercise 11, find a nonzero vector in Nul A and a 
nonzero vector in Col A. 

14. For A as in Exercise 12, find a nonzero vector in Nul A and 

dr ~ a nonzero vector in Col A. 

Determine which sets in Exercises 15-20 are bases for R 2 or R 3 . 
Justify each answer. 


16 


16. 


'- 2 ' 


' 4" 

5 

1 

-10 


2.8 EXERCISES 


Exercises 1-4 display sets in JR 2 . Assume the sets include the 
bounding lines. In each case, give a specific reason why the set 
H is not a subspace of R 2 . (For instance, find two vectors in H 
whose sum is not in H, or find a vector in H with a scalar multiple 
that is not in H. Draw a picture.) 


7. Let 



2 " 


"-3" 


"-4" 

Vl = 

-8 

,V 2 = 

8 

,v 3 = 

6 


6 


-7 


-7 


P 


6 

-10 

11 


and A = [vi V 2 V 3 ]. 


a. How many vectors are in {vi, V 2 , V 3 }? 

b. How many vectors are in Col A1 

c. Is p in Col A1 Why or why not? 

8. Let 




"-2" 


"-2" 


0" 


Vl = 

0 

,v 2 = 

3 

,v 3 = 

-5 

2 . 


6 


3 


5 


3. 


4. 



r 


"-2" 


"-3" 

Let Vi = 

3 

-4 

,V 2 = 

-3 

7 

,and w = 

-3 

10 


.Deti 


mine if w is in the subspace of JR. 3 generated by vi and V 2 . 


1 


4 


5 

-3 

2 

,V 2 = 

-4 

5 

,V 3 = 

-3 

6 

3 


7 


5 


， and u : 


15. 




8 . 


0 


5 


6 

0 

0 

3 

-2 

4 

2 
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In Exercises 21 and 22, mark each statement True or False. Justify 
each answer. 

21. a. A subspace of R n is any set H such that (i) the zero vector 
is in H, (ii) u, y, and u + v are in H , and (iii) c is a scalar 
and cu is in H • 

b. ... ,\ p are in then Span {\i,..., y；,} is the same 

as the column space of the matrix [ Vi •. • ]. 

c. The set of all solutions of a system of m homogeneous 
equations in n unknowns is a subspace of R m . 

d. The columns of an invertible n 乂 n matrix form a basis 
for R n . 

e. Row operations do not affect linear dependence relations 
among the columns of a matrix. 


22. a. A subset H of is a subspace if the zero vector is in H. 

b. If B is an echelon form of a matrix A, then the pivot 
columns of B form a basis for Col A. 

c. Given vectors Vi,..., in R n , the set of all linear com¬ 
binations of these vectors is a subspace of R n . 

d. Let // be a subspace of R n . If x is in //, and y is in R ”， 
then x + y is in //. 

e. The column space of a matrix A is the set of solutions of 
Ax = b. 

Exercises 23-26 display a matrix A and an echelon form of A. 

Find a basis for Col A and a basis for Nul A. 


26. A = 


0 

6 


-1 


0 2 
0 0 
0 0 


-3 -1 

3 0 

9 -1 

9 -2 

-3 0 

6 0 
0 -1 
0 0 


2 

-4 

6 

6 

-4 

2 

0 


27. Construct a 3 x 3 matrix A and a nonzero vector b such that 
b is in Col A, but b is not the same as any one of the columns 
of A. 


28. Construct a 3 x 3 matrix A and a vector b such that b is not 
in Col A. 


29. Construct a nonzero 3x3 matrix A and a nonzero vector b 
such that b is in Nul A. 

30. Suppose the columns of a matrix A = [ai … a p ] are linearly 
independent. Explain why {ai, … ， a^} is a basis for Col A. 

In Exercises 31-36, respond as comprehensively as possible, and 
justify your answer. 


31. Suppose F is a 5 x 5 matrix whose column space is not equal 
to E 5 . What can be said about Nul FI 

32. If 5 is a 7 x 7 matrix and Col B = JR 7 , what can be said about 
solutions of equations of the form Bx = b for b in R 7 ? 

33. If C is a 6 x 6 matrix and Nul C is the zero subspace, what 
can be said about solutions of equations of the form Cx = b 
for b in R 6 ? 




"4 

5 

9 

-2" 


"1 

2 

6 

-5" 

23. 

A = 

6 

5 

1 

12 

〜 

0 

1 

5 

—6 



_3 

4 

8 

-3_ 


_0 

0 

0 

0 _ 



"3 

-6 

9 

0" 


'1 

-2 

5 

4" 

24. 

A = 

2 

-4 

7 

2 

〜 

0 

0 

3 

6 



3 

-6 

6 

—6 


0 

0 

0 

0 


34. What can be said about the shape of an m x n matrix A when 
the columns of A form a basis for R m ? 

35. If 5 is a 5 x 5 matrix and Nul B is not the zero subspace, 
what can be said about Col B1 

36. What can be said about Nul C when C is a 6 x 4 matrix with 
linearly independent columns? 

[M] In Exercises 37 and 38, construct bases for the column space 

and the null space of the given matrix A. Justify your work. 


25. 


A = 


1 

4 

8 

-3 

-7 




3 

-5 

0 

-1 

3" 

-1 

2 

7 

3 

4 

37. 

A = 

-7 

9 

-4 

9 

—11 

-2 

2 

9 

5 

5 

-5 

7 

-2 

5 

-7 

_ 3 

6 

9 

-5 

-2 




3 

-7 

-3 

4 

0_ 

"1 

4 

8 

0 

5" 




5 

3 

2 

—6 

-8" 

0 

2 

5 

0 

-1 


38. 

A = 

4 

1 

3 

-8 

-7 

0 

0 

0 

1 

4 


5 

1 

4 

5 

19 

0 

0 

0 

0 

0 




_-7 

-5 

-2 

8 

5_ 


WEB Column Space and Null Space 


WEB A Basis for Col A 
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SOLUTIONS TO PRACTICE PROBLEMS 


1. To determine whether u is in Nul A, simply compute 



1 

-1 

5" 

'-7" 


_0_ 

Au = 

2 

0 

7 

3 

= 

0 


-3 

-5 

-3 

2 


0 


The result shows that u is in Nul ^4. Deciding whether u is in Col A requires more 
work. Reduce the augmented matrix [A u] to echelon form to determine whether 
the equation Ax = u is consistent: 


1 

-1 

5 

-7" 


"1 

-1 

5 

-7" 


"1 

-1 

5 

-7" 

2 

0 

7 

3 

〜 

0 

2 

-3 

17 

〜 

0 

2 

-3 

17 

-3 

-5 

-3 

2 


0 

-8 

12 

-19 


0 

0 

0 

49 


The equation Ax = u has no solution, so u is not in Col A. 

2. In contrast to Practice Problem 1, finding a vector in Nul A requires more work 
than testing whether a specified vector is in Nul A. However, since A is already 
in reduced echelon form, the equation Ax = 0 shows that if x = (xi, X2,X3), then 
X2 = 0, X3 = 0, and X\ is a free variable. Thus, a basis for Nul ^4 is y = (1,0,0). 
Finding just one vector in Col ^4 is trivial, since each column of A is in Col ^4. In 
this particular case, the same vector y is in both Nul A and Col A. For most n x n 
matrices, the zero vector of W 1 is the only vector in both Nul A and Col ^4. 

3. If A is invertible, then the columns of A span by the Invertible Matrix Theorem. 

By definition, the columns of any matrix always span the column space, so in this 
case Col A is all of W 1 . In symbols, Col A = . Also, since A is invertible, the 

equation Ax = 0 has only the trivial solution. This means that Nul 乂 is the zero 
subspace. In symbols, Nul ^4 = {0}. 


2.9 DIMENSION AND RANK 

This section continues the discussion of subspaces and bases for subspaces, beginning 
with the concept of a coordinate system. The definition and example below should make 
a useful new term, dimension, seem quite natural, at least for subspaces of R 3 . 

Coordinate Systems 

The main reason for selecting a basis for a subspace H, instead of merely a spanning 
set, is that each vector in H can be written in only one way as a linear combination of 
the basis vectors. To see why, suppose B = {bi,..., b^} is a basis for H, and suppose 
a vector x in // can be generated in two ways, say, 

x = Cibi H - h c p b p and x = d\bx H - h d p b p (1) 

Then, subtracting gives 

0 = x — x = {c\ d\)\)\ + ••• + (Cp — d p )hp (2) 

Since B is linearly independent, the weights in (2) must all be zero. That is, Cj = dj 
for I < j < p, which shows that the two representations in (1) are actually the same. 
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DEFINITION 


Suppose the set B = {bi, ..., b^} is a basis for a subspace H. For each x in H, 
the coordinates of x relative to the basis B are the weights C\,... ,c p such that 
x = cibi + • • • + Cpbp, and the vector in ~K P 

~ ci ~ 

l x h = : 

- C P _ 

is called the coordinate vector of x (relative to B) or the /^-coordinate vector 

of x. 1 



3 


-1 


3 

EXAMPLE 1 Let = 

6 

,v 2 = 

0 

,x = 

12 


2 


1 


7 


,and B = {vi, V 2 }. Then 


S is a basis for H = Span {vi, V 2 } because Vi and \2 are linearly independent. Deter¬ 
mine if x is in H, and if it is, find the coordinate vector of x relative to B. 


SOLUTION If x is in H, then the following vector equation is consistent: 



3 


-1 


3 

C\ 

6 

+ c 2 

0 

= 

12 


2 


1 


7 


The scalars c\ and C 2 , if they exist, are the ^-coordinates of x. Row operations show 
that 


"3 

-1 

3' 


"1 

0 

2' 

6 

0 

12 

〜 

0 

1 

3 

2 

1 

7 


0 

0 

0 


Thus Ci = 2, <：2 = 3, and [x]^ = 



.The basis B determines a “coordinate system” 


on H, which can be visualized by the grid shown in Fig. 1. 


■ 



FIGURE 1 A coordinate system on a plane H in 
R 3 . 


'It is important that the elements of B are numbered because the entries in [x]g depend on the order of the 
vectors in B. 
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DEFINITION 


DEFINITION 


Notice that although points in H are also in M 3 , they are completely determined by 
their coordinate vectors, which belong to R 2 . The grid on the plane in Fig. 1 makes 
H “look” like R 2 . The correspondence x i-> [x]^ is a one-to-one correspondence 
between H and R 2 that preserves linear combinations. We call such a correspondence 
an isomorphism ，and we say that H is isomorphic to R 2 . 

In general, if B = {bi, …， b^} is a basis for H ， then the mapping x [x]^ is a 
one-to-one correspondence that makes H look and act the same as (even though the 
vectors in H themselves may have more than p entries). (Section 4.4 has more details.) 

The Dimension of a Subspace 

It can be shown that if a subspace H has a basis of p vectors, then every basis of H must 
consist of exactly p vectors. (See Exercises 27 and 28.) Thus the following definition 
makes sense. 


The dimension of a nonzero subspace H, denoted by dim H, is the number of 
vectors in any basis for H. The dimension of the zero subspace {0} is defined to 
be zero. 2 

The space W 2 has dimension n. Every basis for M. n consists of n vectors. A plane 
through 0 in R 3 is two-dimensional, and a line through 0 is one-dimensional. 

EXAMPLE 2 Recall that the null space of the matrix A in Example 6 in Section 2.8 
had a basis of 3 vectors. So the dimension of Nul A in this case is 3. Observe how each 
basis vector corresponds to a free variable in the equation Ax = 0. Our construction 
always produces a basis in this way. So, to find the dimension of Nul ^4, simply identify 
and count the number of free variables in Ax = 0. ■ 

The rank of a matrix A, denoted by rank A, is the dimension of the column space 
of A. 


Since the pivot columns of A form a basis for Col A, the rank of A is just the number 
of pivot columns in A. 


EXAMPLE 3 Determine the rank of the matrix 

'2 5-3-4 8 

A 

A 


7-4-3 
9-5 2 

-9 6 5 


SOLUTION Reduce A to echelon form: 


A 


Pivot columns - 

The matrix A has 3 pivot columns, so rank A = 3. 


"2 

5 

-3 

-4 

8" 


"2 

5 

-3 

-4 

8" 

0 

-3 

2 

5 

-7 


0 

-3 

2 

5 

-7 

0 

-6 

4 

14 

-20 


0 

0 

0 

4 

-6 

0 

-9 

6 

5 

-6 


0 

0 

0 

0 

0 


m 


2 The zero subspace has no basis (because the zero vector by itself forms a linearly dependent set). 









156 CHAPTER 2 Matrix Algebra 


THEOREM 14 


THEOREM 15 


THEOREM 


The row reduction in Example 3 reveals that there are two free variables in Ax = 0, 
because two of the five columns of A are not pivot columns. (The nonpivot columns 
correspond to the free variables in Ax = 0.) Since the number of pivot columns plus the 
number of nonpivot columns is exactly the number of columns, the dimensions of Col A 
and Nul A have the following useful connection. (See the Rank Theorem in Section 4.6 
for additional details.) 


The Rank Theorem 

If a matrix A has n columns, then rank A + dim Nul A = n. 


The following theorem is important for applications and will be needed in Chap¬ 
ters 5 and 6. The theorem (proved in Section 4.5) is certainly plausible, if you think of 
a /7-dimensional subspace as isomorphic to R^ 7 . The Invertible Matrix Theorem shows 
that p vectors in ~R P are linearly independent if and only if they also span M. p . 

The Basis Theorem 

Let // be a -dimensional subspace of W l . Any linearly independent set of exactly 
p elements in H is automatically a basis for H. Also, any set of p elements of 
H that spans H is automatically a basis for H. 


Rank and the Invertible Matrix Theorem 

The various vector space concepts associated with a matrix provide several more 
statements for the Invertible Matrix Theorem. They are presented below to follow the 
statements in the original theorem in Section 2.3. 


The Invertible Matrix Theorem (continued) 

Let ^4 be an x matrix. Then the following statements are each equivalent to 
the statement that A is an invertible matrix. 

m. The columns of A form a basis of W 1 . 

n. CoM = R n 

o. dim Col A = n 

p. rank^4 = n 

q. Nul A = {0} 

r. dimNuM = 0 


PROOF Statement (m) is logically equivalent to statements (e) and (h) regarding linear 
independence and spanning. The other five statements are linked to the earlier ones of 
the theorem by the following chain of almost trivial implications: 

(g) 4 ⑻ 4 ⑹ 4 (p) 4 (r) 4 (q) 4 ⑹ 


Statement (g), which says that the equation Ax = b has at least one solution for each 
b in M w , implies statement (n), because Col ^4 is precisely the set of all b such that 
the equation Ax = b is consistent. The implications (n) => (o) => (p) follow from the 
definitions of dimension and rank. If the rank of ^4 is the number of columns of A, 
then dim Nul ^4 = 0, by the Rank Theorem, and so Nul A = {0}. Thus (p) => (r) (q). 
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Also, statement (q) implies that the equation ^4x = 0 has only the trivial solution, which 
is statement (d). Since statements (d) and (g) are already known to be equivalent to the 
statement that A is invertible, the proof is complete. ■ 


|— NUMERICAL NOTES - 

Many algorithms discussed in this text are useful for understanding concepts 
and making simple computations by hand. However, the algorithms are often 
unsuitable for large-scale problems in real life. 

Rank determination is a good example. It would seem easy to reduce a matrix 
to echelon form and count the pivots. But unless exact arithmetic is performed 
on a matrix whose entries are specified exactly, row operations can change the 

厂 5 7" 

apparent rank of a matrix. For instance, if the value of x in the matrix ^ ^ 

is not stored exactly as 7 in a computer, then the rank may be 1 or 2, depending 
on whether the computer treats x — 7 as zero. 

In practical applications, the effective rank of a matrix A is often determined 
from the singular value decomposition of A, to be discussed in Section 7.4. 


PRACTICE PROBLEMS 


1. Determine the dimension of the subspace H of R 3 spanned by the vectors Vi, \j, 
and V 3 . (First, find a basis for H.) 

Vl 

2. Consider the basis 


2 」’ 

3. Could M 3 possibly contain a four-dimensional subspace? Explain. 


for R 2 . If [x] g = 


2 


3 


-1 

-8 

6 

,v 2 = 

-7 

-1 

,V 3 = 

6 

-7 


B 


jr n 

".2" 

iL- 2 J 

1 


what is 


2.9 EXERCISES 


In Exercises 1 and 2, find the vector x determined by the given 
coordinate vector [x]g and the given basis B. Illustrate your 
answer with a figure, as in the solution of Practice Problem 2. 


(「11 


_ 2" 

) r , 

■3_ 

iL 1 」 

• 

-1 

卜 Me = 

2 


2 . 




In Exercises 3-6, the vector x is in a subspace H with a basis 
B = {bi,b 2 }. Find the ^-coordinate vector of x. 



2 


-1 


0 


3. bi = 

-3 

,b 2 = 

5 

,x = 




4. bj = 

1 

-5 

， b 2 = 

-2 

3 

,x = 

1 

9 



1 


-2 




2 

5. bi = 

4 

， b 2 = 

-7 

X = 



9 


-3 


5 




-7 


-3 


7 




5 

6. bi = 

2 

， b 2 = 

-3 

X = 



0 


-4 


5 




-2 
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11 . A 


2 4 -5 2 -3 

3 6 -8 3 -5 

0 0 9 0 9 

一 3 _6 一 7 _3 -io 

1 2-5 1-4. 

0 0 5 0 5 

0 0 0 0 0 

0 0 0 0 0 


7. Let bi 


0 


， b2 


2 


-2 


4 


and 


B = {bi,b 2 }. Use the figure to estimate [w]g and [x]^. 
Confirm your estimate of [x]g by using it and {bi,b 2 } to 
compute x. 



8 . Let bi 


"O' 

, b2 = 

'2' 


'-2' 


'2" 

_2_ 

1 

,x = 

3_ 

， y = 

4 


-2.5 


and B = {bi,b 2 }. Use the figure to estimate 


[x]g, [y]s, and [z]g. Confirm your estimates of [y]g and [z]e 
by using them and {bj,b 2 } to compute y and z. 



Exercises 9-12 display a matrix A and an echelon form of A. Find 
bases for Col A and Nul A, and then state the dimensions of these 
subspaces. 


9. A 


10 . A 


'1 

3 

2 

—6 


■ 

3 

3 

2" 

3 

9 

1 

5 


0 0 

5 

-7 

2 

6 

-1 

9 


0 0 

0 

5 

_5 

15 

0 

14 


0 0 

0 

0 

'1 

-2 

-1 

5 4" 




2 

-1 

1 

5 6 




-2 

0 

-2 

1 

—6 




3 

1 

4 

1 

5_ 





1 - 2-1 2 0 

0 110 3 

0 0 0 1 0 

0 0 0 0 1 


In Exercises 13 and 14, find a basis for the subspace spanned by 
the given vectors. What is the dimension of the subspace? 



15. Suppose a 4 x 6 matrix A has four pivot columns. Is 
Col A = R 4 ? Is Nul A = R 2 ? Explain your answers. 

16. Suppose a 4 x 7 matrix A has three pivot columns. Is 
Col A = R 3 ? What is the dimension of Nul A? Explain your 
answers. 

In Exercises 17 and 18, mark each statement True or False. Justify 
each answer. Here ^4 is an m x « matrix. 

17. a. If ^ = {vi, ... ， v p } is a basis for a subspace H and if 

x = CiVi +- h c p \ p , then ... ,c p are the coordi¬ 

nates of x relative to the basis B. 

b. Each line in R” is a one-dimensional subspace of R n . 

c. The dimension of Col A is the number of pivot columns 
in A. 

d. The dimensions of Col A and Nul A add up to the number 
of columns in A. 

e. If a set of p vectors spans a /7-dimensional subspace H 
of M. n , then these vectors form a basis for H . 

18. a. If 谷 is a basis for a subspace H, then each vector in H 

can be written in only one way as a linear combination of 
the vectors in B. 

b. The dimension of Nul A is the number of variables in the 
equation Ax = 0. 

c. The dimension of the column space of A is rank A . 


6 0 5 9 6 15 0 

11 III 


4 4 0 0 


4 9 9 
I _ I 


2 1 6 4 2 2 0 0 




2 
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d. If B = {vi,, y^} is a basis for a subspace H of R n , 
then the correspondence x i-^- [x]g makes H look and act 
the same as . 

e. If 丹 is a /^-dimensional subspace of W 1 , then a linearly 
independent set of p vectors in // is a basis for H. 

In Exercises 19-24, justify each answer or construction. 

19. If the subspace of all solutions of Ax = 0 has a basis con¬ 
sisting of three vectors and if yl is a 5 x 7 matrix, what is the 
rank of A1 

20. What is the rank of a 6 x 8 matrix whose null space is three- 
dimensional? 

21. If the rank of a 9 x 8 matrix A is 7, what is the dimension of 
the solution space of Ax = 0? 

22. Show that a set {vi,...,¥5} in is linearly dependent if 
dim Span {vi,..., y 5 } = 4. 

23. If possible, construct a 3 x 5 matrix A such that dim Nul A = 
3 and dim Col A = 2. 

24. Construct a 3 x 4 matrix with rank 1. 

25. Let A an n x p matrix whose column space is p- 
dimensional. Explain why the columns of A must be linearly 
independent. 

26. Suppose columns 1, 3, 4, 5, and 7 of a matrix A are linearly 
independent (but are not necessarily pivot columns) and the 
rank of A is 5. Explain why the five columns mentioned must 
be a basis for the column space of A. 


27. Suppose vectors bi,..., span a subspace W, and let 

{ai,..., 2 i q } be any set in W containing more than p vec¬ 
tors. Fill in the details of the following argument to show 
that {ai,..., a^} must be linearly dependent. First, let 
5 = [bi ••- b p ] and A = [a\ ••- a q ]. 

a. Explain why for each vector a ; , there exists a vector c 7 
in R 77 such that ay = Be】. 

b. Let C = [ci … c q ]. Explain why there is a nonzero 
vector u such that Cu = 0. 

c. Use B and C to show that Au = 0. This shows that the 
columns of A are linearly dependent. 

28. Use Exercise 27 to show that if A and B are bases for a 
subspace W of R ,! , then A cannot contain more vectors than 
B, and, conversely, B cannot contain more vectors than A. 

29. [M] Let H = Span {vi, y 2 } and B = {\\, y 2 }. Show that x is 
in H, and find the ^-coordinate vector of x, when 



"15" 


'14" 


"16" 


-5 


-10 


0 

Vl = 

12 

,v 2 = 

13 

,x = 

11 


7 


17 


-3 


30. [M] Let H = Span {vi, y 2 , V3} and B = {vi, V2, V3}. Show 
that 谷 is a basis for H and xis in H, and find the ^-coordinate 
vector of x, when 



—6 


8 


-9 


11 


3 


0 


4 


-2 

Vl = 

-9 

,v 2 = 

7 

,v 3 = 

-8 

,x = 

17 


4 


-3 


3 


-8 
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SOLUTIONS TO PRACTICE PROBLEMS 


1. Construct A = [vi V 2 V 3 ] so that the subspace spanned by vi,V 2 , V 3 is the column 
space of A. A basis for this space is provided by the pivot columns of A. 


2 . 



2 3 -1" 


'2 3 -1" 


"2 3 -1" 

A = 

-8 -7 6 

〜 

0 5 2 

〜 

0 5 2 


6-1 -7 


0-10 -4 


0 0 0 


The first two columns of A are pivot columns and form a basis for H. Thus 
dim// = 2 . 


If [x]6 


2 


,then x is formed from a linear combination of the basis vectors using 


weights 3 and 2: 


1 2 3 4 

x = 3bi+2b 2 = 3 2 +2 ^ = 26 

The basis {bi,b 2 } determines a coordinate system for R 2 , illustrated by the grid in 
the figure. Note how x is 3 units in the bi-direction and 2 units in the b 2 -direction. 
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3. A four-dimensional subspace would contain a basis of four linearly independent 
vectors. This is impossible inside R 3 . Since any linearly independent set in R 3 has 
no more than three vectors, any subspace of R 3 has dimension no more than 3. The 
space R 3 itself is the only three-dimensional subspace of R 3 . Other subspaces of R 3 
have dimension 2, 1, or 0. 


CHAPTER 2 SUPPLEMENTARY EXERCISES 


1. Assume that the matrices mentioned in the statements below 

have appropriate sizes. Mark each statement True or False. 

Justify each answer. 

a. If A and B are m x n, then both AB T and A r B are 
defined. 

b. If AB = C and C has 2 columns, then A has 2 columns. 

c. Left-multiplying a matrix 5 by a diagonal matrix A, with 
nonzero entries on the diagonal, scales the rows of B. 

d. If BC = BD, then C = D. 

e. If AC = 0, then either ^4 = 0 or C = 0. 

f. If A and 5 are/i x n, then (A + 召）(乂 — B) = A 2 — B 2 . 

g. An elementary n x n matrix has either n or « + 1 
nonzero entries. 

h. The transpose of an elementary matrix is an elementary 
matrix. 

i. An elementary matrix must be square. 


matrices used in the study of electron spin in quantum 
mechanics. Show that A 2 = I, B 2 = /, and AB = —BA. 
Matrices such that AB = —BA are said to anticommute. 


7. Let A 


■ 1 

3 

8" 


"-3 

5" 

2 

4 

11 

and B = 

1 

5 

1 

2 

5 


3 

4 


.Compute 


A~ l B without computing A~ l . [Hint: A~ l B is the solution 
of the equation AX = B.] 

8. Find a matrix A such that the transformation x i-> Ax maps 

,respectively. [Hint: 


1 


and 


2 


into 


1 


and 


Write a matrix equation involving A, and solve for A.] 
5 4 


9. Suppose AB 


and B 


.Find A. 


10. Suppose A is invertible. Explain why A T A is also invertible. 
Then show that A~ l = (A T A)~ l A 7 . 


j. Every square matrix is a product of elementary matrices. 

k. If ^4 is a 3 x 3 matrix with three pivot positions, 
there exist elementary matrices E\,..., E p such that 
Ep • • • E\A = 1. 

l. If AB = /, then A is invertible. 

m. If A and B are square and invertible, then AB is invert¬ 
ible, and = A~ l B~~ l . 

n. If AB = BA and if A is invertible, then A~ l B = BA~ l . 

o. If A is invertible and if r 一 0, then (rA)~ l = rA~ l . 

"1 " 

p. If yl is a 3 x 3 matrix and the equation Ax = 0 has 

_ 0 _ 

a unique solution, then A is invertible. 

_ 「4 5" 

2. Find the matrix C whose inverse is C~ l = r _ . 

6 7 


11. Let xi,... ,x n be fixed numbers. The matrix below, called 
a Vandermonde matrix, occurs in applications such as signal 
processing, error-correcting codes, and polynomial interpo¬ 
lation. 



"1 

ii 

… 

x n x ~ l ~ 

V = 

1 


x\ … 

— 1 


_ 1 

Xn 

X 2 n … 


Given y = 

(h，. 

.,y n ) in R n , 

supposec 


R n satisfies Kc = y, and define the polynomial 

pit) = Co + C\t + Cjt 2 + ••• + C n —\t n 1 

a. Show that p(x\) = y\,, p(x n ) = y n . We call 
p(t) an interpolating polynomial for the points 
(xi, ji),, (x n , y n ) because the graph of p(t) passes 
through the points. 



"0 

0 

0" 


3. Let A = 

1 

0 

0 

.Show that A 3 = 0. Use matrix 


0 

1 

0_ 



algebra to compute the product (/ — A)(I + ^4 + A 2 ). 

4. Suppose A n = 0 for some n > 1. Find an inverse for I — A. 


b. Suppose Xi,... ,x n are distinct numbers. Show that the 
columns of V are linearly independent. [Hint: How many 
zeros can a polynomial of degree n — \ have?] 

c. Prove: “If x\,... ,x n are distinct numbers, and yi,... ,y n 
are arbitrary numbers, then there is an interpolating poly¬ 
nomial of degree < « — 1 for (xi, ji),..., (x„ ， 少 „).’’ 


5. Suppose an n xn matrix A satisfies the equation A 2 — 
2A-\-1 = 0. Show that A 3 = 3A- 21 and A 4 = 4A-3I. 


6. Let A 


B 


0 


.These are Pauli spin 


12. Let A = LU, where L is an invertible lower triangular ma¬ 
trix and U is upper triangular. Explain why the first column 
of ^4 is a multiple of the first column of L. How is the second 
column of A related to the columns of L? 
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13. Given u in with u r u = 1, let P = uu r (an outer product) 
and Q = I — 2P. Justify statements (a), (b), and (c). 
a. P 2 = P b. P T = P c. Q 2 = I 
The transformation x 1 -^- Px is called a projection ，and 
x 1 -^ Qx is called a Householder reflection. Such reflections 
are used in computer programs to create multiple zeros in a 
vector (usually a column of a matrix). 



"0" 


"1 " 


14. Let u = 

0 

1 

and x = 

5 

3 

.Determine P and Q as in 


Exercise 13, and compute Px and Qx. The figure shows that 
Qx is the reflection of x through the X 1 X 2 -plane. 


X 3 



A Householder reflection through the plane 

X3 = 0 . 


15. Suppose C = E 3 E 2 E 1 B, where E\, E 2 , and E 3 are elemen¬ 
tary matrices. Explain why C is row equivalent to B. 

16. Let ^4 be an « x n singular matrix. Describe how to construct 
an n x n nonzero matrix B such that AB = 0. 

17. Let ^4 be a 6 x 4 matrix and 召 a 4 x 6 matrix. Show that the 
6 x6 matrix AB cannot be invertible. 

18. Suppose yl is a 5 x 3 matrix and there exists a 3 x 5 matrix 
C such that CA = 1^. Suppose further that for some given b 
in R 5 , the equation Ax = b has at least one solution. Show 
that this solution is unique. 

19. [M] Certain dynamical systems can be studied by examining 
powers of a matrix, such as those below. Determine what 
happens to A k and B k as k increases (for example, try 
k = 2,..., 16). Try to identify what is special about A and 
B. Investigate large powers of other matrices of this type, 
and make a conjecture about such matrices. 



'.4 

.2 

.3" 


_ 0 

.2 

.3" 

A = 

.3 

.6 

.3 

, B = 

.1 

.6 

.3 


.3 

.2 

.4 


.9 

.2 

A 


20. [M] Let A n be the n x n matrix with 0’s on the main diagonal 
and l’s elsewhere. Compute A~ l for « = 4, 5, and 6, and 
make a conjecture about the general form of A~ l for larger 
values of n. 
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Determinants 


WEB 

INTRODUCTORY EXAMPLE 

Random Paths and Distortion 

In his autobiographical book “Surely You’re Joking, 
Mr. Feynman,” the Nobel Prize-winning physicist Richard 
Feynman tells of observing ants in his Princeton graduate 
school apartment. He studied the ants’ behavior by 
providing paper ferries to sugar suspended on a string 
where the ants would not accidentally find it. When an ant 
would step onto a paper ferry, Feynman would transport the 
ant to the food and then back. After the ants learned to use 
the ferry, he relocated the return landing. The colony soon 
confused the outbound and return ferry landings, indicating 
that their “learning” consisted of creating and following 
trails. Feynman confirmed this conjecture by laying glass 
slides on the floor. Once the ants established trails on the 
glass slides, he rearranged the slides and therefore the trails 
on them. The ants followed the repositioned trails and 
Feynman could direct the ants where he wished. 

Suppose Feynman had decided to conduct additional 
investigations using a globe built of wire mesh on which 
an ant must follow individual wires and choose between 
going left and right at each intersection. If several ants and 
an equal number of food sources are placed on the globe, 
how likely is it that each ant would find its own food source 
rather than encountering another ant’s trail and following 
it to a shared resource? 1 


1 The solution to the ant-path problem (and two other applications) can 
be found in a June 2005, Mathematical Monthly article by Arthur 
Benjamin and Naomi Cameron. 


In order to record the actual routes of the ants and to 
communicate the results to others, it is convenient to use 
a rectangular map of the globe. There are many ways to 
create such maps. One simple way is to use the longitude 
and latitude on the globe as x and y coordinates on the map. 
As is the case with all maps, the result is not a faithful 
representation of the globe. Features near the “equator” 
look much the same on the globe and the map, but regions 
near the “poles” of the globe are distorted. Images of polar 
regions are much larger than the images of similar sized 
regions near the equator. To fit in with its surroundings on 
the map, the image of an ant near one of the poles should 
be larger than one near the equator. How much larger? 

Surprisingly, both the ant-path and the area distortion 
problems are best answered through the use of the determi¬ 
nant, the subject of this chapter. Indeed, the determinant 
has so many uses that a summary of the applications known 
in the early 1900’s filled a four volume treatise by Thomas 
Muir. With changes in emphasis and the greatly increased 
sizes of the matrices used in modern applications, many 
uses that were important then are no longer critical today. 
Nevertheless, the determinant still plays an important role. 
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Beyond introducing the determinant in Section 3.1，this chapter presents two important 
ideas. Section 3.2 derives an invertibility criterion for a square matrix that plays a pivotal 
role in Chapter 5. Section 3.3 shows how the determinant measures the amount by which 
a linear transformation changes the area of a figure. When applied locally, this technique 
answers the question of a map’s expansion rate near the poles. This idea plays a critical 
role in multivariable calculus in the form of the Jacobian. 


3.1 INTRODUCTION TO DETERMINANTS 


Recall from Section 2.2 that a 2 x 2 matrix is invertible if and only if its determinant 
is nonzero. To extend this useful fact to larger matrices, we need a definition for the 
determinant of an x /? matrix. We can discover the definition for the 3x3 case by 
watching what happens when an invertible 3x3 matrix A is row reduced. 

Consider A = [a"] with an ^ 0. If we multiply the second and third rows of A by 
a\\ and then subtract appropriate multiples of the first row from the other two rows, we 
find that A is row equivalent to the following two matrices: 


a \\ 

ai2 

ai3 


an 

^12 

ai3 

^11^21 

^11^22 

au«23 

〜 

0 

^11^22 — ^12^21 

“11«23 — ^13^21 


^11^32 

^11^33_ 


0 

^11^32 — ^12^31 

^11^33 — ^13^31 _ 


Since A is invertible, either the (2,2)-entry or the (3,2)-entry on the right in (1) is 
nonzero. Let us suppose that the (2,2)-entry is nonzero. (Otherwise, we can make a 
row interchange before proceeding.) Multiply row 3 by ^ 11^22 - «i 2 ^ 2 i ? and then to the 
new row 3 add —(^ 11^32 - ^ 12 ^ 31 ) times row 2. This will show that 

an a\2 «i3 

j 〜 0 ^ 11^22 — ^ 12^21 ^ 11^23 — ^ 13^21 

0 0 an A 


where 


A = ^11^22^33 + ^12^23^31 + ^13^21^32 - ^11^23^32 - ^12^21^33 - ^13^22^31 ⑵ 

Since A is invertible, A must be nonzero. The converse is true, too, as we will see in 
Section 3.2. We call A in (2) the determinant of the 3x3 matrix A. 

Recall that the determinant of a 2 x 2 matrix, A = [aij], is the number 

det^ = a n a 2 2 ~ a 12^21 

For a 1 x 1 matrix—say, A = [an] — we define det^4 = a\\. To generalize the defini¬ 
tion of the determinant to larger matrices, we’ll use 2x2 determinants to rewrite the 
3x3 determinant A described above. Since the terms in A can be grouped as 

(a 11^22^33 — ^11^23^32) — (^12^21^33 — ^12^23^3l) + (^13^21^32 — ^13^22^3l), 


A = • det 

^22 

023 

— ct \2 • det 

^21 

^23 

+ 叱 .det 

021 

^22 


^32 

033 


^31 

^33 


031 

a 32 


For brevity, write 

A = an • det^n — an - det^n + - det (3) 

where An, A 12 , and Au are obtained from A by deleting the first row and one of the 
three columns. For any square matrix A, let Aij denote the submatrix formed by deleting 
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the zth row and jth column of A. For instance, if 


A 


-2 5 0 
0 4-1 

1 0 7 
4-2 0 


then A 32 is obtained by crossing out row 3 and column 2, 


so that 


^32 


1 5 0 

2 4-1 
0-2 0 


We can now give a recursive definition of a determinant. When n = 3, det A is defined 
using determinants of the 2x2 submatrices Ay, as in (3) above. When n = 4, det ^4 
uses determinants of the 3x3 submatrices A\j. In general, an « x « determinant is 
defined by determinants of (n — l) x (n — 1) submatrices. 


DEFINITION 


EXAMPLE 1 Compute the determinant of 


A 


SOLUTION Compute det ^4 = a\ \ det^n — a \2 det ^12 + det A 12 ： 


det A = \ - det 


5 • det 


2-1 
0 0 


+ 0 • det 


2 4 

0-2 


1(0-2) -5(0 - 0) + 0(-4 - 0) = -2 


■ 


Another common notation for the determinant of a matrix uses a pair of vertical 
lines in place of brackets. Thus the calculation in Example 1 can be written as 


det A 


-2 


To state the next theorem, it is convenient to write the definition of det Aina, slightly 
different form. Given A = [atj], the (/, 7)-cofactor of A is the number C,) given by 

Qj = (-iy^'det4 7 


⑷ 


1 5 0 

2 4-1 
0-2 0 


1-250 

2 0 4 -1 

3 10 7 

0 4-20 


For n > 2, the determinant of an « x n matrix A = [atj] is the sum of n terms 
of the form 士 det Ay, with plus and minus signs alternating, where the entries 
a\\,a\ 2 ,..., ci\ n are from the first row of A. In symbols, 

det^l = an det^n — an dot A \ 2 H - h {—\) l+n a\ n det^4 ln 

n 

7=1 


4 

-1 

-5 

2 

-1 

+ 0 

2 

4 

-2 

0 

0 

0 

0 

-2 


Then 


det ^4 = di\C\i + ^12^12 + ••• + ci\nC\ n 
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THEOREM 1 


This formula is called a cofactor expansion across the first row of A. We omit the 
proof of the following fundamental theorem to avoid a lengthy digression. 


The determinant of an « x n matrix A can be computed by a cofactor expansion 
across any row or down any column. The expansion across the zth row using the 
cofactors in (4) is 

det^l = anCii + + ••• + ciin^in 

The cofactor expansion down the yth column is 

det^4 = a\jC\j + ci2jCy + ••• + a nj C n j 


The plus or minus sign in the (z, j )-cofactor depends on the position of in the 
matrix, regardless of the sign of aij itself. The factor (—l) z+i determines the following 
checkerboard pattern of signs: 

+ — + 

-+ - 
+ - + 


EXAMPLE 2 Use a cofactor expansion across the third row to compute det ^4, where 

1 5 O' 


A 


2 4-1 

0-2 0 


SOLUTION Compute 

det^l = ^31^*31 + ^32^32 + ^33^*33 

=(—1) 3+I fl3i det 乂 31 + (—l) 3+2 a32 det 乂 32 + (—1) 3+3 ^33 det^33 


0 


_(_ 2 ) 2 
0+ 2(-1) + 0 = -2 


+ 0 


2 


■ 


Theorem 1 is helpful for computing the determinant of a matrix that contains many 
zeros. For example, if a row is mostly zeros, then the cofactor expansion across that row 
has many terms that are zero, and the cofactors in those terms need not be calculated. 
The same approach works with a column that contains many zeros. 


EXAMPLE 3 Compute det A, where 


A = 


3-7 
0 2 
0 0 
0 0 
0 0 


8 9 

-5 7 

1 5 

2 4 
0-2 


-6 

3 

0 

-1 

0 


SOLUTION The cofactor expansion down the first column of A has all terms equal to 
zero except the first. Thus 


det ^4 = 3- 


2-573 
0 15 0 

0 2 4 -1 

0 0-20 


— 0 • C*2l + 0 • C*3i — 0 • C*4i + 0 • C*5i 
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Henceforth we will omit the zero terms in the cofactor expansion. Next, expand this 
4x4 determinant down the first column, in order to take advantage of the zeros there. 
We have 


1 5 0 

det^ = 3.2.2 4-1 

0-2 0 


This 3 x 3 determinant was computed in Example 1 and found to equal —2. Hence 
dctA = 3 • 2 • (-2) = -12. ■ 

The matrix in Example 3 was nearly triangular. The method in that example is 
easily adapted to prove the following theorem. 


THEOREM 2 If ^4 is a triangular matrix, then det^4 is the product of the entries on the main 
diagonal of A. 


The strategy in Example 3 of looking for zeros works extremely well when an entire 
row or column consists of zeros. In such a case, the cofactor expansion along such a row 
or column is a sum of zeros! So the determinant is zero. Unfortunately, most cofactor 
expansions are not so quickly evaluated. 

i— NUMERICAL NOTE - 

By today’s standards, a 25 x 25 matrix is small. Yet it would be impossible to 
calculate a 25 x 25 determinant by cofactor expansion. In general, a cofactor 
expansion requires over n ! multiplications, and 25! is approximately 1.5 x 10 25 . 

If a computer performs one trillion multiplications per second, it would have 
to run for over 500,000 years to compute a 25 x 25 determinant by this method. 
Fortunately, there are faster methods, as we’ll soon discover. 


Exercises 19-38 explore important properties of determinants, mostly for the 2x2 
case. The results from Exercises 33-36 will be used in the next section to derive the 
analogous properties for n x n matrices. 


PRACTICE PROBLEM 


Compute 


5-7 
0 3 

-5-8 
0 5 


2 2 
0-4 
0 3 
0-6 


3.1 EXERCISES 

Compute the determinants in Exercises 1-8 using a cofactor 
expansion across the first row. In Exercises 1-4, also compute the 
determinant by a cofactor expansion down the second column. 


3 

0 

4 


0 

5 

1 

2 

3 

2 

2. 

4 

-3 

0 

0 

5 

-1 


2 

4 

1 



2 

-4 

3 


1 

3 

5 

3. 

3 

1 

2 

4. 

2 

1 

1 


1 

4 

-1 


3 

4 

2 


2 

3 

-4 


5 

-2 

4 

5. 

4 

0 

5 

6. 

0 

3 

-5 


5 

1 

6 


2 

-4 

7 
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Compute the determinants of the elementary matrices given in 
Exercises 25-30. (See Section 2.2.) 


Compute the determinants in Exercises 9-14 by cofactor expan¬ 
sions. At each step, choose a row or column that involves the least 
amount of computation. 




23. 


1 1 1 

-3 8 -4 

2-3 2 


13. 


14. 


6 3 2 4 0 

9 0-4 1 0 

8-5671 

3 0 0 0 0 

4 2 3 2 0 


The expansion of a 3 x 3 determinant can be remembered by the 
following device. Write a second copy of the first two columns to 
the right of the matrix, and compute the determinant by multiply¬ 
ing entries on six diagonals: 


// / 


x \2 u 13 ( 


*21 w 22 “23 W 21 M 22 

/ X X \ 

2 31 fl 32 fl 33」 fl 31 fl 32 

\\ \ 

+ + + 


Add the downward diagonal products and subtract the upward 
products. Use this method to compute the determinants in Ex¬ 
ercises 15-18. Warning: This trick does not generalize in any 
reasonable way to Ax 4 or larger matrices. 


Use Exercises 25-28 to answer the questions in Exercises 31 and 
32. Give reasons for your answers. 

31. What is the determinant of an elementary row replacement 
matrix? 

32. What is the determinant of an elementary scaling matrix with 
k on the diagonal? 


In Exercises 33-36, verify that det EA = 
E is the elementary matrix shown and A : 

33. 

35. 


(det ^)(det A), where 
a ( 


37. Let A : 


.Write 5A. Is det 5A = 5detv4? 


38. Let^ 


a b 
c d 

relates det kA to k and det A. 


and let A: be a scalar. Find a formula that 


In Exercises 39 and 40, yl is an « x n matrix. Mark each statement 
True or False. Justify each answer. 

39. a. An n x n determinant is defined by determinants of 

(n — l) x (n — 1 ) submatrices. 

b. The (/, 7 )-cofactor of a matrix A is the matrix A t j ob¬ 
tained by deleting from A its i th row and yth column. 

40. a. The cofactor expansion of det A down a column is the 

negative of the cofactor expansion along a row. 


c d 


In Exercises 19-24, explore the effect of an elementary row 
operation on the determinant of a matrix. In each case, state the 
row operation and describe how it affects the determinant. 


19. 


a 

b' 


c 

d' 

20 . 

a 

b' 


a 

b ■ 

c 

d 


a 

b 

c 

d 

' 

kc 

kd 



3 

2 

2 


a 

b 

c 


6 

5 

6 


d 



k 

k 

k 

， 

-3 

8 

-4 


2 

-3 

2 


0 

r 

34. 

1 

0 

1 

o_ 

0 

k 

" 1 

k 

36. 

" 1 

k 

O' 

0 

l 

1 


15. 


17. 


3 

0 

4 


0 

5 

1 

2 

3 

2 

16. 

4 

-3 

0 

0 

5 

-1 


2 

4 

1 

2 

-4 

3 


1 

3 

5 

3 

1 

2 

18. 

2 

1 

1 

1 

4 

-1 


3 

4 

2 


9. 


11 . 


6 

0 

0 

5 


1 

-2 

5 

2 

1 

7 

2 

-5 

1 A 

0 

0 

3 

0 

2 

0 

0 

0 

1U. 

2 

—6 

-7 

5 

8 

3 

1 

8 


5 

0 

4 

4 

3 

5 

-8 

4 


4 

0 

0 

0 

0 

-2 

3 

-7 

12 . 

7 

-1 

0 

0 

0 

0 

1 

5 

2 

6 

3 

0 

0 

0 

0 

2 


5 

-8 

4 

-3 


4 

3 

0 


8 

1 

6 

21 . 

'3 4" 


3 

4 

6 

5 

2 

8 . 

4 

0 

3 

5 6 _ 


■ 5 + 3A: 

6 + 4^_ 

9 

7 

3 


3 

-2 

5 


a b 


a kc 

b + kd 



1 

0 

0 


1 

0 

0 

25. 

0 

1 

0 

26. 

0 

1 

0 


_0 

k 

1 _ 


k 

0 

1 _ 


"k 

0 

0 ~ 


'1 

0 

0 " 

27. 

0 

1 

0 

28. 

0 

k 

0 


_ 0 

0 

1 _ 


_0 

0 

1 _ 


"0 

1 

0 " 


"0 

0 

1 一 

29. 

1 

0 

0 

30. 

0 

1 

0 


0 

0 

1 


1 

0 

0 


c 2 6 

办 2 5 

a3 6 



1i 

3 4 


5 0 8 3 2 
_ I _ 

3 0 4 2 1 

- 

7 2 6 5 9 

- I 

0 0 3 0 0 

4 0 7 5 0 
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b. The determinant of a triangular matrix is the sum of the 
entries on the main diagonal. 


tion 2.1.) Repeat the calculations for three other pairs of 
n x n matrices, for various values of n. Report your results. 


and y = ^ - Compute the area of the par¬ 

allelogram determined by u, y, u + y, and 0, and compute 
the determinant of [ u y ]. How do they compare? Replace 
the first entry of v by an arbitrary number x, and repeat the 
problem. Draw a picture and explain what you find. 


41. Let u 


42. Let u = 


and y = 


c 

0 


,where a, b, c are positive (for 


simplicity). Compute the area of the parallelogram deter¬ 
mined by u, y, u + y, and 0, and compute the determinants of 
the matrices [ u v ] and [ y u ] • Draw a picture and explain 
what you find. 

43. [M] Is it true that det(^4 B) = det A detBl To find 


44. [M] Is it true that det AB = (det i4)(det B)1 Experiment 
with four pairs of random matrices as in Exercise 43, and 
make a conjecture. 

45. [M] Construct a random 4x4 matrix A with integer entries 
between —9 and 9, and compare det ^4 with det A T , det(—^4), 
det(2i4), and det(10^4). Repeat with two other random 4x4 
integer matrices, and make conjectures about how these de¬ 
terminants are related. (Refer to Exercise 36 in Section 2.1.) 
Then check your conjectures with several random 5x5 and 
6x6 integer matrices. Modify your conjectures, if neces¬ 
sary, and report your results. 

46. [M] How is det^ -1 related to det A1 Experiment with 
random n xn integer matrices for n = 4, 5, and 6, and make 
a conjecture. Note: In the unlikely event that you encounter 


out, generate random 5x5 matrices A and B, and compute a matrix with a zero determinant, reduce it to echelon form 

det(^4 B) — det A — det B. (Refer to Exercise 37 in Sec- and discuss what you find. 


SOLUTION TO PRACTICE PROBLEM 


Take advantage of the zeros. Begin with a cofactor expansion down the third column to 
obtain a 3 x 3 matrix, which may be evaluated by an expansion down its first column. 


5 

-7 

2 

2 


0 

-5 

3 

-8 

0 

0 

-4 

3 

=(-l) 1+3 2 

0 

5 

0 

—6 



-8 


-4 


3 


-6 


= 2.(-l) 2+1 (-5) 


= 20 


The (—1) 2+1 in the next-to-last calculation came from the (2,1)-position of the —5 in 
the 3x3 determinant. 


3.2 PROPERTIES OF DETERMINANTS 

The secret of determinants lies in how they change when row operations are performed. 
The following theorem generalizes the results of Exercises 19-24 in Section 3.1. The 
proof is at the end of this section. 


THEOREM 3 Row Operations 

Let 乂 be a square matrix. 

a. If a multiple of one row of A is added to another row to produce a matrix B, 
then det 5 = det A 

b. If two rows of A are interchanged to produce B, then det B = — det ^4. 

c. If one row of A is multiplied by k to produce B, then det B = k • det^4. 


The following examples show how to use Theorem 3 to find determinants 
efficiently. 
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1 -4 2 

EXAMPLE 1 Compute det A, where A = —2 8 —9 

_-l 7 0_ 

SOLUTION The strategy is to reduce A to echelon form and then to use the fact that 
the determinant of a triangular matrix is the product of the diagonal entries. The first 
two row replacements in column 1 do not change the determinant: 



1 

-4 

2 


1 

-4 

2 


1 

-4 

2 

det ^4 = 

-2 

8 

-9 

= 

0 

0 

-5 

= 

0 

0 

-5 


-1 

7 

0 


-1 

7 

0 


0 

3 

2 


An interchange of rows 2 and 3 reverses the sign of the determinant, so 


1 -4 2 

det^ = - 0 3 2 = -(l)(3)(-5) = 15 ■ 

0 0-5 

A common use of Theorem 3(c) in hand calculations is to factor out a common 
multiple of one row of a matrix. For instance, 


* 

氺 

* 


* 

* 

* 

5k 

-2k 

3k 

=k 

5 

-2 

3 

* 

氺 

* 


* 

* 

* 


where the starred entries are unchanged. We use this step in the next example. 


2-8 
3 —9 

EXAMPLE 2 Compute det A, where A = Q n 

— J U 

1 -4 


5 10 

1 -2 
0 6 


SOLUTION To simplify the arithmetic, we want a 1 in the upper-left corner. We could 
interchange rows 1 and 4. Instead, we factor out 2 from the top row, and then proceed 
with row replacements in the first column: 


det ^4 = 2 

1 

3 

-4 

-9 

3 

5 

4 

10 

= 2 

1 

0 

-4 

3 

J 

4 

-3 

0 

1 

-2 

0 

-12 

10 

10 


1 

-4 

0 

6 


0 

0 

-3 

2 


Next, we could factor out another 2 from row 3 or use the 3 in the second column as a 
pivot. We choose the latter operation, adding 4 times row 2 to row 3: 


det A = 2 


-4 3 4 

3 -4-2 
0-6 2 
0-3 2 


Finally, adding —1/2 times row 3 to row 4, and computing the “triangular” determinant, 
we find that 


det A = 2 


1-434 
0 3-4-2 

0 0-62 
0 0 0 1 


2 • ⑴⑶ ( 一 6)(1) = -36 


■ 
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U = 


■ 木 氺氺 

G ■氺* 

0 0b* 

0 0 0 ■ 

d&tU^ 0 


u = 


■ 氺 木本 

0 ■ * * 

0 0 0 ■ 

0 0 0 0 

det t/ = 0 


FIGURE 1 

Typical echelon forms of square 
matrices. 


Suppose a square matrix A has been reduced to an echelon form U by row replace¬ 
ments and row interchanges. (This is always possible. See the row reduction algorithm 
in Section 1.2.) If there are r interchanges, then Theorem 3 shows that 

det ^4 = (—l) r det U 

Since U is in echelon form, it is triangular, and so det U is the product of the 
diagonal entries Un ,..., u nn . If A is invertible, the entries u“ are all pivots (because 
A 〜 I n and the u“ have not been scaled to l’s). Otherwise, at least u nn is zero, and the 
product wii … u nn is zero. See Fig. 1. Thus 


det^4 



when A is invertible 
when A is not invertible 


⑴ 


It is interesting to note that although the echelon form U described above is not unique 
(because it is not completely row reduced), and the pivots are not unique, the product 
of the pivots is unique, except for a possible minus sign. 

Formula (1) not only gives a concrete interpretation of det] but also proves the 
main theorem of this section: 


THEOREM 4 A square matrix A is invertible if and only if det A ^ 0. 


Theorem 4 adds the statement “det 乂 / 0” to the Invertible Matrix Theorem. A 
useful corollary is that det ^4 = 0 when the columns of A are linearly dependent. Also, 
det A = 0 when the rows of A are linearly dependent. (Rows of A are columns of A T , 
and linearly dependent columns of A T make A T singular. When A T is singular, so is 
A, by the Invertible Matrix Theorem.) In practice, linear dependence is obvious when 
two columns or two rows are the same or a column or a row is zero. 


EXAMPLE 3 Compute det A, where A 


3-1 2-5 

0 5-3-6 

- 67-74 
-5-809 


SOLUTION Add 2 times row 1 to row 3 to obtain 

3-1 2 


dQtA = det 


-5 


0 5-3-6 

0 5-3-6 

-5-8 0 9 


because the second and third rows of the second matrix are equal. 


■ 


i— NUMERICAL NOTES - 

1. Most computer programs that compute det ^4 for a general matrix A use the 
method of formula (1) above. 

2 . It can be shown that evaluation of an n x « determinant using row operations 
requires about 2n 3 /3 arithmetic operations. Any modern microcomputer can 
calculate a 25 x 25 determinant in a fraction of a second, since only about 
10,000 operations are required. 
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THEOREM 5 


Computers can also handle large “sparse” matrices, with special routines that take 
advantage of the presence of many zeros. Of course, zero entries can speed hand compu¬ 
tations, too. The calculations in the next example combine the power of row operations 
with the strategy from Section 3.1 of using zero entries in cofactor expansions. 


0 12-1 
2 5—73 

EXAMPLE 4 Compute det A, where A = 0 3 6 2 

-2-5 4-2 


SOLUTION A good way to begin is to use the 2 in column 1 as a pivot, eliminating 
the —2 below it. Then use a cofactor expansion to reduce the size of the determinant, 
followed by another row replacement operation. Thus 


det ^4 = 


0 

2 

0 

0 


1 2 
5-7 
3 6 
0-3 


3 

2 

1 


1 

2-1 


1 

2-1 

3 

6 2 

=-2 

0 

0 5 

0 

-3 1 


0 

-3 1 


An interchange of rows 2 and 3 would produce a “triangular determinant.” Another 
approach is to make a cofactor expansion down the first column: 


det^ = (-2)(1) 


= -2-(15) = -30 


■ 


Column Operations 

We can perform operations on the columns of a matrix in a way that is analogous to the 
row operations we have considered. The next theorem shows that column operations 
have the same effects on determinants as row operations. 


If A is an n x n matrix, then det^4 r = det ^4. 


PROOF The theorem is obvious for « = 1. Suppose the theorem is true for k x k 
determinants and let « = A: + 1. Then the cofactor of a\j in A equals the cofactor 
of aji in A T , because the cofactors involve k 乂 k determinants. Hence the cofactor 
expansion of det ^4 along the first row equals the cofactor expansion of det A T down the 
first column. That is, A and A T have equal determinants. Thus the theorem is true for 
n = l, and the truth of the theorem for one value of n implies its truth for the next value 
of n. By the principle of induction, the theorem is true for all n > 1. ■ 

Because of Theorem 5, each statement in Theorem 3 is true when the word row is 
replaced everywhere by column. To verify this property, one merely applies the original 
Theorem 3 to A T . A row operation on A T amounts to a column operation on A. 

Column operations are useful for both theoretical purposes and hand computations. 
However, for simplicity we’ll perform only row operations in numerical calculations. 

Determinants and Matrix Products 

The proof of the following useful theorem is at the end of the section. Applications are 
in the exercises. 
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THEOREM 6 


Multiplicative Property 

If A and B are n x n matrices, then det AB = (det ^4)(det B). 


EXAMPLE 5 Verify Theorem 6 for A = 



and B = 


2 


SOLUTION 

25 20" 

14 13 

and 

dQt AB = 25- 13-20* 14 = 325 - 280 = 45 


AB 


6 1 

4 3 


3 2 

1 2 



Since det A = 9 and det B = 5, 


(det 4) (det B) = 9 • 5 = 45 = dtt AB ■ 

Warning: A common misconception is that Theorem 6 has an analogue for sums of 
matrices. However, det(yl + B) is not equal to det ^4 + det B, in general. 


A Linearity Property of the Determinant Function 

For m n x n matrix A, we can consider det ^4 as a function of the n column vectors in 
A. We will show that if all columns except one are held fixed, then det ^4 is a linear 
function of that one (vector) variable. 

Suppose that the y th column of A is allowed to vary, and write 

4 = [ai … a 7 _i x a y -+i … a"] 

Define a transformation T from W 1 to R by 

T (x) = det [ ai … a y _i x a 7 -+i … a„ ] 

Then, 

T (cx) = cT(x) for all scalars c and all x in R n (2) 

r(u + v) = r(u) + r(v) for all u ， v in IT (3) 

Property (2) is Theorem 3(c) applied to the columns of A. A proof of property (3) 
follows from a cofactor expansion of det^4 down the jth column. (See Exercise 43.) 
This (multi-) linearity property of the determinant turns out to have many useful conse¬ 
quences that are studied in more advanced courses. 


Proofs of Theorems 3 and 6 

It is convenient to prove Theorem 3 when it is stated in terms of the elementary matrices 
discussed in Section 2.2. We call an elementary matrix E a row replacement (matrix) if 
E is obtained from the identity I by adding a multiple of one row to another row; E is 
an interchange if E is obtained by interchanging two rows of I ; and E is a scale by r if 
E is obtained by multiplying a row of / by a nonzero scalar r. With this terminology, 
Theorem 3 can be reformulated as follows: 
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If A is an n x n matrix and E is an n x n elementary matrix，then 

detEA = (det E) (det A) 


where 


det E = {—\ 


if E is a row replacement 
if E is an interchange 


r if E is a scale by r 

PROOF OF THEOREM 3 The proof is by induction on the size of A. The case of a 


2x2 matrix was verified in Exercises 33-36 of Section 3.1. Suppose the theorem has 
been verified for determinants of k y.k matrices with fc > 2, let « = A: + 1, and let A 
be n x n. The action of ^ on ^4 involves either two rows or only one row. So we 
can expand det EA across a row that is unchanged by the action of E, say, row i. Let 
Aij (respectively, Bij) be the matrix obtained by deleting row i and column j from A 
(respectively, EA). Then the rows of B" are obtained from the rows of by the same 
type of elementary row operation that E performs on A. Since these submatrices are 
only k x k, the induction assumption implies that 


det = a - det 

where a = 1, —1, or r, depending on the nature of E. The cofactor expansion across 
row / is 


didEA = CL{\ ( — l) z '+i det+ ••• + — l)’+”det_5/ w 

= aan(—l) l+l detain + ••• + aai n (—l) l+n det^4, n 
=a - det ^4 


In particular, taking A = I n , wq see that dttE = 1, — 1, or r, depending on the nature 
of E. Thus the theorem is true for n = 2, and the truth of the theorem for one value of 
n implies its truth for the next value of n. By the principle of induction, the theorem 
must be true for n > 2. The theorem is trivially true for n = \. ■ 


PROOF OF THEOREM 6 If ^4 is not invertible, then neither is AB, by Exercise 27 
in Section 2.3. In this case, detAfi = (det ^4)(det B), because both sides are zero, by 
Theorem 4. If A is invertible, then A and the identity matrix I n are row equivalent by 
the Invertible Matrix Theorem. So there exist elementary matrices E\， ...， E p such that 

A = E p E p -i ••• Ei- I n = E p E p -x - -Ex 

For brevity, write \ A\ for det Then repeated application of Theorem 3, as rephrased 
above, shows that 


\AB\ = \E P --E X B\ = \E P \\E P -^^E X B\ = -^ 

= I 仏 |•••| 五 1 ||5| =••• = I 五 〆 ••E 1 ||5| 

= 1 ^ 11^1 ■ 


PRACTICE PROBLEMS 


Compute 


1-3 1-2 

2 -5 -1 -2 


-4 

10 


in as few steps as possible. 
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2. Use a determinant to decide if Vi, \ 2 , \3 are linearly independent, when 


Vl = 

5" 

-7 

, v 2 = 

"-3" 

3 

, V 3 = 

2 " 

-7 


9 


-5 


5 


3.2 EXERCISES 


Each equation in Exercises 1-4 illustrates a property of determi¬ 
nants. State the property. 


Find the determinants in Exercises 15-20, where 
b c 


2 . 


3. 


4. 


0 5-2 

1 -3 6 

4- 1 8 

2-6 4 

3 5-2 

1 6 3 

1 3 -4 

2 0-3 

5- 4 7 

1 2 3 

0 5-4 

3 7 4 


1 -3 6 

0 5-2 

4-1 8 

1 -3 2 

3 5-2 

1 6 3 

1 3-4 

0-6 5 

5-4 7 


15. 


17. 


e 

h ' 

a b 

d e 

5g 5h 


1 . 


5 / 


16. 


18. 


a 

3d 

8 

g 

a 

d 


b 

3e 

h 


c 

3 / 


1 2 
0 5 

0 1 


Find the determinants in Exercises 5-10 by row reduction to 
echelon form. 


5. 


19. 


20 . 


b 


2 e + b 2/ + c 


b + e c -\- f 

e f 

h i 


a 

2 d a 
g 

a + d 
d 
g 

In Exercises 21-23, use determinants to find out if the matrix is 


7. 


9. 


10 . 


1 

5 

—6 


1 

5 

-3 



invertible. 





-1 

-4 

4 


6 . 

3 

-3 

3 









-2 

-7 

9 


2 

13 

-7 




2 

3 

0 


5 0-1 










21 . 

1 

3 

4 


22 . 

1 -3 -2 

1 

3 

0 

2 


1 

3 

3 

一 4 


1 

2 

1 


0 5 3 

一 2 

-5 

7 

4 


0 

1 

2 

一 5 







3 

5 

2 

1 

8 . 

2 

5 

4 

-3 


2 

0 

0 

8 


1 

-1 

2 

-3 


-3 

-7 

-5 

2 

23. 

1 

-7 

—5 

0 













3 

8 

6 

0 


1 

-1 

-3 

0 







0 

7 

5 

4 


0 

1 

5 

4 












1 

2 

8 

5 






In Exercises 24-26, use determinants to decide if the set 

3 

-1 

-2 

3 






is linearly independent. 



3-1 0-2 

2 -4 -1 -6 
-6 2 3 9 

7-3 8-7 

5 5 2 7 


24. 


26. 


Combine the methods of row reduction and cofactor expansion to 
compute the determinants in Exercises 11-14. 


11 . 


4 


-7 


-3 

6 

, 

0 

, 

-5 

-7 


2 


6 

3 


2 


-2 

5 


-6 


-1 

—6 

9 

0 

' 

3 

4 


7 


0 


25. 


7 


-8 


7 

-4 

, 

5 

, 

0 

—6 


7 


-5 


13 . 


2 

5 

-3 

-1 


-1 

2 

3 

0 

3 

0 

1 

-3 

12. 

3 

4 

3 

0 

—6 

0 

-4 

9 

5 

4 

6 

6 

4 

10 

-4 

-1 


4 

2 

4 

3 

2 

5 

4 

1 


一 3 

-2 

1 

-4 

4 

7 

6 

2 

14. 

1 

3 

0 

-3 

6 

-2 

-4 

0 

-3 

4 

-2 

8 

—6 

7 

7 

0 


3 

-4 

0 

4 


In Exercises 27 and 28, A and B are n xn matrices. Mark each 
statement True or False. Justify each answer. 

27. a. A row replacement operation does not affect the determi¬ 
nant of a matrix. 

b. The determinant of A is the product of the pivots in any 
echelon form U of A, multiplied by (—l) r , where r is the 
number of row interchanges made during row reduction 
from A to U. 
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c. If the columns of A are linearly dependent, then 
det A = 0. 

d. det(^4 + B) = det A + det B. 

2$. a. If two row interchanges are made in succession, then the 
new determinant equals the old determinant. 

b. The determinant of A is the product of the diagonal entries 
in A. 

c. If dot A is zero, then two rows or two columns are the 
same, or a row or a column is zero. 


in the exercises above) to compute: 

a. det AB b. det 5A c. det B T 

d. det^4 _1 e. det^4 3 

40. Let A and B be 4x4 matrices, with det^4 = — 1 and 
det B = 2. Compute: 

a. det AB b. det B 5 c. det 2A 

d. d&tA T A e. det B~ l AB 

41. Verify that det A = dctB detC, where 


29. 


d. det^4 r = (—l)det^4. 
Compute det B 5 , where B = 





A = 

a e 

b+r 

, B = 

a 

b 


"1 

0 

r 




d 

_ 


c 

d 


1 

1 

2 



■ 1 

o' 



a 

b 

- 

1 

2 

1 

42. Let A 

= 

0 

1 

and B = 

c 




e 1 

c d 


.Show that 


30. Use Theorem 3 (but not Theorem 4) to show that if two rows 
of a square matrix A are equal, then det A = 0. The same is 
true for two columns. Why? 

In Exercises 31-36, mention an appropriate theorem in your 

explanation. 

31. Show that if A is invertible, then det A~ l = —-— . 

det A 

32. Find a formula for det(rv4) when A is mn x n matrix. 

33. Let A and B be square matrices. Show that even though 
AB and BA may not be equal, it is always true that 
dot AB = dtt BA. 

34. Let A and P be square matrices, with P invertible. Show 
that dt\.(PAP~ l ) = detvl. 

35. Let t/ be a square matrix such that U T U = I . Show that 
detU = 士 1 . 


det(i4 + B) 

=det ^4 + det B if and only if a d = i 

43. Verify that det A = 

det B + detC, where 


an 

a i2 

U\ + V\ 



A = 

^21 

^22 

U2 - 

- V2 




_ a3i 

a 32 

W3 - 

-l；3 




an 

a i2 

U\ 



~ an an ^1 

B = 

a2i 

Cl22 

U2 

,c 

= 

Cl2\ CI22 V2 


^31 

a 32 

W3 



^31 a 32 v 3 


Note, however, that A is not the same as 5 + C. 

44. Right-multiplication by an elementary matrix E affects the 
columns of A in the same way that left-multiplication affects 
the rows. Use Theorems 5 and 3 and the obvious fact that E T 
is another elementary matrix to show that 

det AE = (det £")(det A) 


36. Suppose that ^4 is a square matrix such that det A 4 = 0. 
Explain why A cannot be invertible. 


Verify that det AB = (det ^4)(det B) for the matrices in Exercises 
37 and 38. (Do not use Theorem 6.) 


37. A = 


B 


0 

4 


38 - ^ = [-i = J 

39. Let A and B be 3x3 matrices, with det A = 4 and 
det 5 = —3. Use properties of determinants (in the text and 


Do not use Theorem 6. 

45. [M] Compute det A T A and det AA T for several random 4x5 
matrices and several random 5x6 matrices. What can you 
say about A T A and AA T when A has more columns than 
rows? 

46. [M] If det A is close to zero, is the matrix A nearly singu¬ 
lar? Experiment with the nearly singular 4x4 matrix A in 
Exercise 9 of Section 2.3. Compute the determinants of A, 
10A, and 0.1 A In contrast, compute the condition numbers 
of these matrices. Repeat these calculations when A is the 
4x4 identity matrix. Discuss your results. 


SOLUTIONS TO PRACTICE PROBLEMS 


1. Perform row replacements to create zeros in the first column and then create a row 
of zeros. 


1 

-3 

1 

-2 


1 

-3 

1 

-2 


1 

-3 

1 

-2 

2 

-5 

-1 

-2 


0 

1 

-3 

2 


0 

1 

-3 

2 

0 

-4 

5 

1 


0 

-4 

5 

1 


0 

-4 

5 

1 

-3 

10 

-6 

8 


0 

1 

-3 

2 


0 

0 

0 

0 
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2 . det[vi \2 V 3 ] 


5 

-3 

2 


5 

-3 

2 

-7 

3 

-7 

= 

-2 

0 

-5 

9 

-5 

5 


9 

-5 

5 


-(- 3 ) 


-(- 5 ) 


Row 1 added 
to row 2 


Cofactors of 
column 2 


= 3 • (35) + 5 • (-21) = 0 

By Theorem 4, the matrix [ Vi \2 V 3 ] is not invertible. The columns are linearly 
dependent, by the Invertible Matrix Theorem. 


3.3 CRAMER'S RULE, VOLUME, AND LINEAR TRANSFORMATIONS 

This section applies the theory of the preceding sections to obtain important theoretical 
formulas and a geometric interpretation of the determinant. 

Cramer’s Rule 

Cramer’s rule is needed in a variety of theoretical calculations. For instance, it can be 
used to study how the solution of Ax = b is affected by changes in the entries of b. 
However, the formula is inefficient for hand calculations, except for 2 x 2 or perhaps 
3x3 matrices. 

For any n x n matrix A and any b in , let (b) be the matrix obtained from A 
by replacing column i by the vector b. 

A.(b) = [ai ... b ... a„] 


col i 


THEOREM 7 


Cramer's Rule 


Let A be an invertible n x n matrix. For any b in R w , the unique solution x of 
Ax = b has entries given by 


X/ = 


det Ai (b) 
det^4 


i = 1,2,... ,n 


⑴ 


PROOF Denote the columns of ^4 by ai,..., a„ and the columns of the n x n identity 
matrix / by ei, … ， e„. If Ax = b, the definition of matrix multiplication shows that 

A - I t (x) = [ei ••• x ••• e„ ] = [ Ae\ ••• Ax ••• Ae n ] 

=[ai … b … a„ ] = At (b) 

By the multiplicative property of determinants, 

(det ^)(det/ z (x)) = det At (b) 

The second determinant on the left is simply Xj. (Make a cofactor expansion along the 
ith row.) Hence (det A) • Xi = det (b). This proves ( 1 ) because A is invertible and 
det ■ 


EXAMPLE 1 Use Cramer’s rule to solve the system 


3x\ — 2x2 = 6 
—5x\ + 4^2 = 8 











178 CHAPTER 3 Determinants 


SOLUTION View the system as Ax = b. Using the notation introduced above, 

A 3—2 . 6—2 , . 3 6 

4 = 1_—5 4 」， 綱 = 1_8 4 」，塌 5 8 - 

Since det A = 2, the system has a unique solution. By Cramer’s rule, 


Xi = 


detvli(b) 
dot A 


X2 = 


det A 2 (b) 
dot A 


_ 24+ 16 
— ^ 2 ^ 
_ 24 + 30 

= ^ Y~ 


= 20 

= 27 


■ 


Application to Engineering 

A number of important engineering problems, particularly in electrical engineering and 
control theory, can be analyzed by Laplace transforms. This approach converts an 
appropriate system of linear differential equations into a system of linear algebraic 
equations whose coefficients involve a parameter. The next example illustrates the type 
of algebraic system that may arise. 


EXAMPLE 2 Consider the following system in which s is an unspecified parameter. 
Determine the values of x for which the system has a unique solution, and use Cramer’s 
rule to describe the solution. 

3sx\ — 2^2 = 4 

—6X\ + SX2 = 1 


SOLUTION View the system as Ax = b. Then 


3 * 5-2 
6 s 


山 (b)= 



A 2 (b) 


3s 4 
-6 1 


Since 

det^ = 3s 2 -12= 3(s + 2)0 — 2) 

the system has a unique solution precisely when 夕 # 士 2. For such an s, the solution is 
(x\,X 2 ), where 


det 山 (b) _ 4s + 2 
6etA = 30 + 2)0 _ 2) 
det i4 2 (b) 3s + 24 s S 

det ^4 3(*5 + 2 )(^ — 2 ) (s + 2 )(^ — 2 ) 


A Formula for A -1 


Cramer’s rule leads easily to a general formula for the inverse of an n x « matrix A. 
The yth column of A~ l is a vector x that satisfies 

Ax = e y - 

where e y is the jth column of the identity matrix, and the ith entry of x is the (/, j )-entry 
of A~ l . By Cramer’s rule, 


{(z, 7 )-entry of A~ 1 } = X[ 


det^/(e 7 ) 


det ^4 


⑵ 
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Recall that Aji denotes the submatrix of A formed by deleting row j and column i. A 
cofactor expansion down column i of A[ (e y ) shows that 


det^ ； (e y ) = (-iy + ) det A n = C n 


(3) 


where C 7 ,- is a cofactor of A. By (2), the (i, j )-entry of A~ l is the cofactor Cj, divided 
by det A. [Note that the subscripts on Cp are the reverse of (/, j).] Thus 


A~ 


dctA 


Cn 

C2I •. 

• C n \ 

C\2 

C 22 •. 

1 • C n 2 

C\ n 

Cln •. 

• c nn 


⑷ 


The matrix of cofactors on the right side of (4) is called the ad jugate (or classi¬ 
cal adjoint) of A, denoted by adj A. (The term adjoint also has another meaning in 
advanced texts on linear transformations.) The next theorem simply restates (4). 


THEOREM 8 An Inverse Formula 

Let A be an invertible n x n matrix. Then 

1 

A~ 


det ^4 


adj4 


EXAMPLE 3 Find the inverse of the matrix A 


SOLUTION The nine cofactors are 


Cn 

C 21 

C 31 


-1 1 
4-2 

1 3 

4-2 


-1 


=- 2 , C12 = - 

14, C 22 = + 

= 4, C 32 = - 


1 1 
1 -2 

2 3 

1 -2 


1 


2 1 3 

1 -1 1 
1 4-2 


3 ， C\3 

_7 ， C 23 
C33 


1 -1 

1 4 

2 1 

1 4 


1 


-1 


-3 


The adjugate matrix is the transpose of the matrix of cofactors. [For instance, C 12 goes 
in the (2,1) position.] Thus 


adj 4 


14 

-7 

-7 


4 


We could compute det 乂 directly, but the following computation provides a check on 
the calculations above and produces det ^4: 


14/ 


(adj A)-A = 

"-2 14 4" 

3 -7 1 

'2 1 

1 -1 

3" 

1 

_ 

"14 0 0" 

0 14 0 


5 -7 -3 

1 4 

-2 


0 0 14 


Since (adj A)A = 14/, Theorem 8 shows that det A = \A and 


■ 



2 2 
1 / / 


/71414 

1 / / 

13 5 
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i— NUMERICAL NOTES - 

Theorem 8 is useful mainly for theoretical calculations. The formula for A~ x 
permits one to deduce properties of the inverse without actually calculating it. 
Except for special cases, the algorithm in Section 2.2 gives a much better way to 
compute if the inverse is really needed. 

Cramer’s rule is also a theoretical tool. It can be used to study how sensitive 
the solution of Ax = b is to changes in an entry in b or in ^4 (perhaps due 
to experimental error when acquiring the entries for b or A). When 4 is a 
3x3 matrix with complex entries, Cramer’s rule is sometimes selected for hand 
computation because row reduction of [ A b ] with complex arithmetic can be 
messy, and the determinants are fairly easy to compute. For a larger n xn matrix 
(real or complex), Cramer’s rule is hopelessly inefficient. Computing just one 
determinant takes about as much work as solving Ax = b by row reduction. 


Determinants as Area or Volume 

In the next application, we verify the geometric interpretation of determinants described 
in the chapter introduction. Although a general discussion of length and distance in 
will not be given until Chapter 6, we assume here that the usual Euclidean concepts of 
length, area, and volume are already understood for R 2 and R 3 . 


THEOREM 9 If ^4 is a 2 x 2 matrix, the area of the parallelogram determined by the columns of 
A is I det^l|. If ^4 is a 3 x 3 matrix, the volume of the parallelepiped determined 
by the columns of A is |det^4|. 


SG 


A Geometric Proof 
3-12 




0 

d 


a 


x 


. 0 . 


FIGURE 1 

Area = \ad\. 


PROOF The theorem is obviously true for any 2x2 diagonal matrix: 

a Ol , ,, ( area of ) 

0 d - I _ I rectangle | 


det 


See Fig. 1. It will suffice to show that any 2x2 matrix A = [ 2 i\ a 2 ] can be trans¬ 
formed into a diagonal matrix in a way that changes neither the area of the associated 
parallelogram nor |det^4|. From Section 3.2, we know that the absolute value of the 
determinant is unchanged when two columns are interchanged or a multiple of one 
column is added to another. And it is easy to see that such operations suffice to transform 
A into a diagonal matrix. Column interchanges do not change the parallelogram at all. 
So it suffices to prove the following simple geometric observation that applies to vectors 
in R 2 or R 3 : 


Let ai and a 2 be nonzero vectors. Then for any scalar c, the area of the 
parallelogram determined by ai and a 2 equals the area of the parallelogram 
determined by ai and a 2 + cai. 


To prove this statement, we may assume that a 2 is not a multiple of ai, for other¬ 
wise the two parallelograms would be degenerate and have zero area. If L is the line 
through 0 and ai, then a 2 + L is the line through a 2 parallel to L, and a 2 + ca\ is on 
this line. See Fig. 2. The points a 2 and a 2 + cst\ have the same perpendicular distance 
to L. Hence the two parallelograms in Fig. 2 have the same area, since they share the 
base from 0 to ai. This completes the proof for R 2 . 
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FIGURE 2 Two parallelograms of equal area. 


z 



The proof for R 3 is similar. The theorem is obviously true for a 3 x 3 diagonal 
matrix. See Fig. 3. And any 3x3 matrix A can be transformed into a diagonal matrix 
using column operations that do not change |det^4|. (Think about doing row operations 
on A T .) So it suffices to show that these operations do not affect the volume of the 
parallelepiped determined by the columns of A. 

A parallelepiped is shown in Fig. 4 as a shaded box with two sloping sides. Its 
volume is the area of the base in the plane Span {a!, a〗} times the altitude of a 2 above 
Span{ai,a 3 }. Any vector a 2 + C 2 i\ has the same altitude because a 2 + ca\ lies in the 
plane a 2 + Span{ai, a 3 }, which is parallel to Span{ai,a 3 }. Hence the volume of the 
parallelepiped is unchanged when [a! a] a 3 ] is changed to [ai a 2 + csi\ ]. 
Thus a column replacement operation does not affect the volume of the parallelepiped. 
Since column interchanges have no effect on the volume, the proof is complete. ■ 


FIGURE 3 

Volume = \abc\. 




FIGURE 4 Two parallelepipeds of equal volume. 


EXAMPLE 4 Calculate the area of the parallelogram determined by the points 
(-2, -2) ，（ 0, 3) ，（ 4, -1), and (6,4). See Fig. 5 ⑻. 

SOLUTION First translate the parallelogram to one having the origin as a vertex. For 
example, subtract the vertex (—2, —2) from each of the four vertices. The new paral¬ 
lelogram has the same area, and its vertices are (0,0) ，（ 2,5), (6,1), and (8, 6). See 

X 2 X 2 




FIGURE 5 Translating a parallelogram does not change its 


area. 
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Fig. 5(b). This parallelogram is determined by the columns of 

4-P 6 1 

L 5 

Since |det^4| = |—28|, the area of the parallelogram is 28. ■ 

Linear Transformations 

Determinants can be used to describe an important geometric property of linear trans¬ 
formations in the plane and in R 3 . If 7" is a linear transformation and is a set in the 
domain of T, let T(S) denote the set of images of points in S. We are interested in how 
the area (or volume) of T(S) compares with the area (or volume) of the original set S. 
For convenience, when 5 is a region bounded by a parallelogram, we also refer to S as 
a parallelogram. 

THEOREM 10 Let T : R 2 ^ M 2 be the linear transformation determined by a 2 x 2 matrix A. If 
S is a parallelogram in M 2 , then 

{area of T(S)} = |detyl| - {area of 5} (5) 

If T is determined by a 3 x 3 matrix A, and if 5 is a parallelepiped in R 3 , then 

{volume of 7"(5)} = |det^4| • {volume of S} (6) 


PROOF Consider the 2x2 case, with A = [ai a 2 ]. A parallelogram at the origin in 
M 2 determined by vectors bi and b 2 has the form 

S = {^ibi + s 2 b 2 ： 0 < < 1, 0 < *y 2 < 1} 

The image of S under T consists of points of the form 

^(*yibi + S 2 b 2 ) = ^r(bi) + S2T(b 2 ) 

=siAbi + s 2 Ab 2 

where 0 < < 1, 0 < ^2 < 1. It follows that T(S) is the parallelogram determined 

by the columns of the matrix [ Ab\ Ab2 ]. This matrix can be written as AB, where 
B = [bi b 2 ]. By Theorem 9 and the product theorem for determinants, 

{area of T (5)} = |det^45| = |det^4| • |det 5| 

=|det^l| - {area of S} 

An arbitrary parallelogram has the form p + where p is a vector and S is a parallelo¬ 
gram at the origin, as above. It is easy to see that T transforms p + into 7"(p) + T(S). 
(See Exercise 26.) Since translation does not affect the area of a set, 


{area of 7"(p + S)} = {area of T(p) T (S)} 

={area of 7(5)} 

=| det^4| - {area of S} 

=I det ^41 • {area of p + S} 


Translation 
By equation (7) 
Translation 


This shows that (5) holds for all parallelograms in R 2 . The proof of (6) for the 3x3 
case is analogous. ■ 
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When we attempt to generalize Theorem 10 to a region in M 2 or R 3 that is not 
bounded by straight lines or planes, we must face the problem of how to define and 
compute its area or volume. This is a question studied in calculus, and we shall only 
outline the basic idea for R 2 . If is a planar region that has a finite area, then R can 
be approximated by a grid of small squares that lie inside R. By making the squares 
sufficiently small, the area of R may be approximated as closely as desired by the sum 
of the areas of the small squares. See Fig. 6. 
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FIGURE 6 Approximating a planar region by a union of squares. 
The approximation improves as the grid becomes finer. 


If 7" is a linear transformation associated with a 2 x 2 matrix A, then the image of 
a planar region R under T is approximated by the images of the small squares inside R. 
The proof of Theorem 10 shows that each such image is a parallelogram whose area is 
|det^4| times the area of the square. If R f is the union of the squares inside R, then the 
area of T{R f ) is |detyl| times the area of R f . See Fig. 7. Also, the area of T{R r ) is close 
to the area of T(R). An argument involving a limiting process may be given to justify 
the following generalization of Theorem 10. 




FIGURE 7 Approximating T(R) by a union of parallelograms. 


The conclusions of Theorem 10 hold whenever is a region in R 2 with finite area 
or a region in R 3 with finite volume. 


EXAMPLE 5 Let a and b be positive numbers. Find the area of the region E 
bounded by the ellipse whose equation is 


v 2 v 2 
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D 


x i 


b 




E 


( 



- x i 


SOLUTION We claim that E is the image of the unit disk D under the linear transfor¬ 


mation T determined by the matrix A 
and x = An, then 

U\ 


， because if u 


U\ 

u 2 


,x 


A 

x 2 


A x 2 

— and U 2 = ~r 
a b 

It follows that u is in the unit disk, with u\ -\- u\ < 1, if and only if x is in 五， with 
(x\/a) 2 + (x 2 /b) 2 < 1. By the generalization of Theorem 10， 

{area of ellipse} = {area of T(D)} 

= |det^4| - {area of D} 

=ab • 7r(l) 2 = nab ■ 

PRACTICE PROBLEM 


Let S be the parallelogram determined by the vectors bi 


and \)2 


and 


let A 


.Compute the area of the image of S under the mapping x Ax. 


3.3 EXERCISES 


Use Cramer’s rule to compute the solutions of the systems in 
Exercises 1-6. 


1. 5xi + 1x2 = 3 
2,Xi ~j - 4x2 = 1 

3. 3x\ _ 2x2 = 7 

— 5xi + 6 x 2 = — 5 

5. 2x\ + X2 =7 

—3xi + X3 = —8 

X 2 2x3 = _3 


2. Ax\ + x 2 = 6 
5xi + 2x 2 = 7 

4. — 5xi + 3x2 = 
— X2 — ' 


6. 2x\ + X2 + xs = 4 
— X\ -f- 2^3 = 2 

3xj -|- X2 ~h 3x3 — — 2 


In Exercises 7-10, determine the values of the parameter s for 
which the system has a unique solution, and describe the solution. 


7. 65^1 + 4x 2 
9xi + 2sx 2 

9. sxi — 2sx 2 = 
3x\ + 6sx 2 : 


5 8. 3sx\ — 5x 2 = 3 

—2 9x\ + 5^X2 = 2 

- 1 10. 2>sxi -f- X 2 — 1 

4 3sx\ + 65^2 = 2 


In Exercises 11-16, compute the adjugate of the given matrix, and 
then use Theorem 8 to give the inverse of the matrix. 



0 

-2 

-1 



"1 

1 

3" 

11. 

3 

0 

0 

12. 

2 

-2 

1 


-1 

1 

1 



_0 

1 

0_ 


"3 

5 

4" 



"3 

6 

7" 

13. 

1 

0 

1 


14. 

0 

2 

1 


2 

1 

1 



2 

3 

4 


3 

0 

0 


1 

2 

4 

-1 

1 

0 

16. 

0 

-3 

1 

-2 

3 

2 


0 

0 

3 


15. 


17. Show that if yl is 2 x 2, then Theorem 8 gives the same 
formula for A~ l as that given by Theorem 4 in Section 2.2. 

18. Suppose that all the entries in A are integers and det A = l. 
Explain why all the entries in A~ l are integers. 

In Exercises 19-22, find the area of the parallelogram whose 

vertices are listed. 

19. (0,0), (5,2), (6,4), (11,6) 

20. (0,0), (-1,3), (4,-5), (3,-2) 

21. (-1,0), (0,5), (1,-4), (2,1) 

22. (0, -2), (6,-1), (-3,1), (3,2) 

23. Find the volume of the parallelepiped with one vertex at 
the origin and adjacent vertices at (1,0, —2), (1,2,4), and 
(7,1,0). 

24. Find the volume of the parallelepiped with one vertex at 
the origin and adjacent vertices at (1,4,0), (—2, —5,2), and 
(— 1 , 2 , — 1 ). 

25. Use the concept of volume to explain why the determinant of 
a 3 x 3 matrix A is zero if and only if A is not invertible. Do 
not appeal to Theorem 4 in Section 3.2. [Hint: Think about 
the columns of 儿] 

26. Let T : ^ W l be a linear transformation, and let p be a 

vector and S a set in M m . Show that the image of p + 5 under 
T is the translated set T(p) + T(S) in W 1 . 
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27. Let S be the parallelogram determined by the vectors 



'-2" 


'-2" 


■ 6 -2" 

bi = 

3_ 

and b 2 = 

_ 5_ 

,and let A = 

-3 2_ 


Compute the area of the image of S under the mapping 
x Ax. 


28. Repeat Exercise 27 with bi = 





and 


29. Find a formula for the area of the triangle whose vertices are 
0 , Vi, and y 2 in R 2 . 

30. Let R be the triangle with vertices at (xi, ji), (X 2 , J 2 )，and 
(x 3 , j 3 ). Show that 

{area of triangle} = ^ det 

[Hint: Translate R to the origin by subtracting one of the 
vertices, and use Exercise 29.] 


ii 

yi 

Xl 

yi 

13 

ys 


31. Let r : R 3 — R 3 be the linear transformation determined 
_ a 0 0" 

by the matrix A = 0 b 0 , where a, b, and c are 

_0 0 c _ 

positive numbers. Let S be the unit ball, whose bounding 

surface has the equation x\ x\x\ = 1 . 

a. Show that T^) is bounded by the ellipsoid with the 

x[ + x| + x| 


equation 


b 2 


+ 


1 . 


b. Use the fact that the volume of the unit ball is 4 丌 /3 
to determine the volume of the region bounded by the 
ellipsoid in part (a). 


32. Let S be the tetrahedron in M. 3 with vertices at the vectors 0, 
ei, e 2 , and e〗，and let S f be the tetrahedron with vertices at 
vectors 0, Vi, V2, and V3. See the figure. 



a. Describe a linear transformation that maps S onto S r . 

b. Find a formula for the volume of the tetrahedron S f using 
the fact that 

{volume of S} = (l/3){area of base} - {height} 

33. [M] Test the inverse formula of Theorem 8 for a random 
4x4 matrix A. Use your matrix program to compute the 
cofactors of the 3x3 submatrices, construct the adjugate, 
and set B = (adj i4)/(det ^4). Then compute B — inv(^4), 
where inv(^4) is the inverse of A as computed by the matrix 
program. Use floating point arithmetic with the maximum 
possible number of decimal places. Report your results. 

34. [M] Test Cramer’s rule for a random 4x4 matrix A and a 
random 4x1 vector b. Compute each entry in the solution of 
Ax = b, and compare these entries with the entries in A~ l b. 
Write the command (or keystrokes) for your matrix program 
that uses Cramer’s rule to produce the second entry of x. 

35. [M] If your version of MATLAB has the flops command, 
use it to count the number of floating point operations to com¬ 
pute A~ l fora random 30 x 30 matrix. Compare this number 
with the number of flops needed to form (adj A ) / (det A). 


SOLUTION TO PRACTICE PROBLEM 


The area of S is 


det 


5 


3 


14, and det ^4 = 2. By Theorem 10, the area of the 


image of S under the mapping x Ax is 

|det^l| - {area of 5} 


2-14 = 28 


CHAPTER 3 SUPPLEMENTARY EXERCISES 


1. Mark each statement True or False. Justify each answer. 
Assume that all matrices here are square. 

a. If ^4 is a 2 x 2 matrix with a zero determinant, then one 
column of ^4 is a multiple of the other. 

b. If two rows of a 3x3 matrix A are the same, then 
det A = 0. 

c. If is a 3 x 3 matrix, then det 5^4 = 5 det ^4. 


d. If A and B are n x n matrices, with det A = 2 and 
det B = 3, then det(^4 + 5) = 5. 

e. If A isn x n and det A = 2, then det^4 3 = 6 . 

f. If B is produced by interchanging two rows of A, then 
det B = det A. 

g. If B is produced by multiplying row 3 of ^4 by 5, then 
det B = 5 - det A. 
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7. Show that the equation of the line in Mr through distinct 
points (xi, ji) and (X 2 , 3 ^ 2 ) can be written as 


1 X 

y 

= 0 

det 

' A 

B~ 

1 

ji 

C 

D 

1 X2 

J2_ 


16. Let J 

be 

the n 


det 


8. Find a3 x 3 determinant equation similar to that in Exercise 7 
that describes the equation of the line through (xi, ji) with 
slope m. 

Exercises 9 and 10 concern determinants of the following Vander¬ 
monde matrices. 

t t 2 

a a" 
b b 2 


dct(AD - CB) 


T 


Vit) = 


t t A V 

X\ x\ x\ 

x 2 x\ x\ 

X3 x\ x\ 


A = (a — b)I + bJ\ that 

abb 
b a b 

^4 _ b b a 


• b b b 

Confirm that det^4 = (a — b) n ~ l [a (n — 1)Z?] as follows: 
a. Subtract row 2 from row 1, row 3 from row 2, and so on, 
and explain why this does not change the determinant of 
the matrix. 


h. If B is formed by adding to one row of ^4 a linear 
combination of the other rows, then det B = det A. 

i. det A r = — det A. 

j. det(—v4) = — det 儿 

k. dt\.A T A > 0. 

l. Any system of n linear equations in n variables can be 
solved by Cramer’s rule. 

m. If u and y are in R 2 and det [ u v ] = 10, then the area 
of the triangle in the plane with vertices at 0, u, and y is 
10 . 

n. If A 3 = 0, then det ^4 = 0. 

o. If A is invertible, then det A~ l = det A. 

p. If A is invertible, then (det ^)(det A~ l ) = 1. 

Use row operations to show that the determinants in Exercises 2-4 
are all zero. 



12 

13 

14 


1 

a 

b c 

2. 

15 

16 

17 


3. 

1 

b 

a c 


18 

19 

20 


1 

c 

a + b 


a 


b 

c 




4. 

a x 


b x 

c X 





a -\- y 


b + y 

c + y 





Compute the determinants in Exercises 5 and 6. 


9. Use row operations to show that 
det T = (b — a)(c — a)(c — b) 

10. Let f{t) = det V, with X\, X 2 , X 3 all distinct. Explain why 
f{t) is a cubic polynomial, show that the coefficient of t 3 is 
nonzero, and find three points on the graph of /. 

11. Determine the area of the parallelogram determined by the 
points (1,4), (—1,5), (3, 9), and (5, 8). How can you tell 
that the quadrilateral determined by the points is actually a 
parallelogram? 

12. Use the concept of area of a parallelogram to write a state¬ 
ment about a 2 x 2 matrix A that is true if and only if A is 
invertible. 


13. 


Show that if A is invertible, then adj A is invertible, and 
1 


(adj^)- 


det A 


-A 


[Hint: Given matrices B and C, what calculation(s) would 
show that C is the inverse of Bl] 

14. Let A, B, C, D, and I be n x n matrices. Use the defini¬ 
tion or properties of a determinant to justify the following 
formulas. Part (c) is useful in applications of eigenvalues 
(Chapter 5). 


a. det 


c. det 


A 

0 

A 

C 


0 

I 

0 

D 


det A b. det 


C 


(det ^4) (det D) = det 


0 

D 

A 

0 


det D 


B 

D 


15. Let A, B, C, and D bt n x n matrices with A invertible, 
a. Find matrices X and Y to produce the block LU factor¬ 
ization 


"A 

B' 


'I 

0 " 

~A 

B~ 

C 

D 


X 

I 

0 

Y 


and then show that 
'A B 
C D 

b. Show that if AC 


det 


(det^).det(D-CA -1 ^) 
CA, then 


9 2 0 0 0 

9 9 5 9 7 

9 9 0 3 0 

10000 

9 9 4 9 6 

5 . 


5 0 7 0 0 

8 0 8 3 0 

8 0 8 8 2 

8 18 8 8 

4 0 6 0 0 

6 . 
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b. With the resulting matrix from part (a), add column 1 to 
column 2, then add this new column 2 to column 3, and so 
on, and explain why this does not change the determinant. 

c. Find the determinant of the resulting matrix from (b). 

17. Let A be the original matrix given in Exercise 16, and let 

a — b b b ••- b' 


B 


b 


b b 


19.[ 


t 


Notice that A, B, and C are nearly the same except that the 
first column of A equals the sum of the first columns of B 
and C. A linearity property of the determinant function, 
discussed in Section 3.2, says that det^4 = det B + detC. 
Use this fact to prove the formula in Exercise 16 by induction 
on the size of matrix A. 

1$. [M] Apply the result of Exercise 16 to find the determinants 
of the following matrices, and confirm your answers using a 
matrix program. 

r 3 8 8 8l r» 3 3 3 31 

3 8 3 3 3 

3 3 8 3 3 

3 3 3 8 3 

L 8 8 8 3 」 33338 


20. [M] Use the method of Exercise 19 to guess the determinant 
of 

"1 1 1 1 

1 3 3 … 3 

13 6 ". 6 

_ 1 3 6 … 3(n - 1) _ 

Justify your conjecture. [Hint: Use Exercise 14(c) and the 
result of Exercise 19.] 


C 


Use the results to guess the determinant of the matrix below, 
and confirm your guess by using row operations to evaluate 
that determinant. 


b b b … b 

b a b • • • b 

b b a • • • b 


b b b • • • a 


3 8 8 8 
8 3 8 8 
8 8 3 8 
8 8 8 3 



12 2 ••- 2 
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Vector Spaces 


INTRODUCTORY EXAMPLE 

Space Flight and Control Systems 

Twelve stories high and weighing 75 tons, Columbia rose 
majestically off the launching pad on a cool Palm Sunday 
morning in April 1981. A product of ten years’ intensive 
research and development, the first U.S. space shuttle was a 
triumph of control systems engineering design, involving 
many branches of engineering — aeronautical, chemical, 
electrical, hydraulic, and mechanical. 

The space shuttle’s control systems are absolutely 
critical for flight. Because the shuttle is an unstable 
airframe, it requires constant computer monitoring during 
atmospheric flight. The flight control system sends a 
stream of commands to aerodynamic control surfaces and 
44 small thruster jets. Figure 1 shows a typical closed- 
loop feedback system that controls the pitch of the shuttle 


during flight. (The pitch is the elevation angle of the nose 
cone.) The junction symbols (0) show where signals 
from various sensors are added to the computer signals 
flowing along the top of the figure. 

Mathematically, the input and output signals to an 
engineering system are functions. It is important in 
applications that these functions can be added, as in 
Fig. 1, and multiplied by scalars. These two operations 
on functions have algebraic properties that are completely 
analogous to the operations of adding vectors in W 1 
and multiplying a vector by a scalar, as we shall see 
in Sections 4.1 and 4.8. For this reason, the set of all 
possible inputs (functions) is called a vector space. The 
mathematical foundation for systems engineering rests 



Commanded 

pitch 


Commanded 

pitch 


Commanded 

pitch 



Pitch 


FIGURE 1 Pitch control system for the space shuttle. (Source: Adapted from Space Shuttle GN&C Operations 
Manual ，Rockwell International, ©1988.) 
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on vector spaces of functions, and Chapter 4 extends the 
theory of vectors in to include such functions. Later on, 


you will see how other vector spaces arise in engineering, 
physics, and statistics. 


WEB 


The mathematical seeds planted in Chapters 1 and 2 germinate and begin to blossom 
in this chapter. The beauty and power of linear algebra will be seen more clearly when 
you view as only one of a variety of vector spaces that arise naturally in applied 
problems. Actually, a study of vector spaces is not much different from a study of W 1 
itself, because you can use your geometric experience with R 2 and R 3 to visualize many 
general concepts. 

Beginning with basic definitions in Section 4.1, the general vector space framework 
develops gradually throughout the chapter. A goal of Sections 4.3-4.5 is to demonstrate 
how closely other vector spaces resemble W 1 . Section 4.6 on rank is one of the high 
points of the chapter, using vector space terminology to tie together important facts about 
rectangular matrices. Section 4.8 applies the theory of the chapter to discrete signals and 
difference equations used in digital control systems such as in the space shuttle. Markov 
chains, in Section 4.9, provide a change of pace from the more theoretical sections of 
the chapter and make good examples for concepts to be introduced in Chapter 5. 


4.1 VECTOR SPACES AND SUBSPACES 


Much of the theory in Chapters 1 and 2 rested on certain simple and obvious alge¬ 
braic properties of R' listed in Section 1.3. In fact, many other mathematical systems 
have the same properties. The specific properties of interest are listed in the following 
definition. 


DEFINITION A vector space is a nonempty set V of objects, called vectors, on which are de¬ 
fined two operations, called addition and multiplication by scalars (real numbers), 
subject to the ten axioms (or rules) listed below . 1 The axioms must hold for all 
vectors u, y, and w in K and for all scalars c and d. 

1. The sum of u and y, denoted by u + y, is in V. 

2 . u + v = v + u. 

3. (u + v) + w = u + (v + w). 

4. There is a zero vector 0 in F such that u + 0 = u. 

5. For each u in V, there is a vector —u in V such that u + (—u) = 0. 

6 . The scalar multiple of u by c, denoted by cu, is in V. 

7. c(u + y) = cu + cy. 

8 . (c + d)u = cu + du. 

9. c(du) = (cd)u. 

10 . lu = u. 


1 Technically, K is a real vector space. All of the theory in this chapter also holds for a complex vector space 
in which the scalars are complex numbers. We will look at this briefly in Chapter 5. Until then, all scalars 
are assumed to be real. 
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Using only these axioms, one can show that the zero vector in Axiom 4 is unique, 
and the vector —u, called the negative of u, in Axiom 5 is unique for each u in V. 
See Exercises 25 and 26. Proofs of the following simple facts are also outlined in the 
exercises: 


For each u in F and scalar c, 

Ou = 0 
cO = 0 

—u = (—l)u 


⑴ 

⑵ 

( 3 ) 


EXAMPLE 1 The spaces R n , where n > 1, are the premier examples of vector 
spaces. The geometric intuition developed for R 3 will help you understand and visualize 
many concepts throughout the chapter. ■ 



FIGURE 1 


EXAMPLE 2 Let V be the set of all arrows (directed line segments) in three-dimen¬ 
sional space, with two arrows regarded as equal if they have the same length and point 
in the same direction. Define addition by the parallelogram rule (from Section 1.3), 
and for each v in K，define c\ to be the arrow whose length is |c| times the length of 
y, pointing in the same direction as v if c > 0 and otherwise pointing in the opposite 
direction. (See Fig. 1.) Show that K is a vector space. This space is a common model 
in physical problems for various forces. 


SOLUTION The definition of V is geometric, using concepts of length and direction. 
No xjz-coordinate system is involved. An arrow of zero length is a single point and 
represents the zero vector. The negative of y is (—l)y. So Axioms 1 ， 4, 5, 6, and 10 are 
evident. The rest are verified by geometry. For instance, see Figs. 2 and 3. ■ 



FIGURE 2 u + y = v + u. 



FIGURE 3 (u + v) + w = u + (y + w). 


EXAMPLE 3 Let § be the space of all doubly infinite sequences of numbers (usually 
written in a row rather than a column): 

{yk} = (.-.,^-2, J-l, 70^1^2, •••) 

If {zk} is another element of §, then the sum { yk} + {Zk} is the sequence { + Zk} 

formed by adding corresponding terms of { yk) and {zk}- The scalar multiple c { yk} is 
the sequence {cyk}- The vector space axioms are verified in the same way as for W 1 . 

Elements of S arise in engineering, for example, whenever a signal is measured (or 
sampled) at discrete times. A signal might be electrical, mechanical, optical, and so on. 
The major control systems for the space shuttle, mentioned in the chapter introduction, 
use discrete (or digital) signals. For convenience, we will call S the space of (discrete¬ 
time) signals. A signal may be visualized by a graph as in Fig. 4. ■ 
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• f + g 

• g 

0 

FIGURE 5 

The sum of two vectors 
(functions). 






Ik tt ! 

-5 -0 5 10 '1 






FIGURE 4 A discrete-time signal. 


EXAMPLE 4 For n > 0, the set P„ of polynomials of degree at most n consists of 
all polynomials of the form 

p(0 = + a\t + a 2 t 2 H —— + a n t n (4) 

where the coefficients ao,... ,a n and the variable t are real numbers. The degree of 
p is the highest power of t in (4) whose coefficient is not zero. If p(^) = ao ^ 0, the 
degree of p is zero. If all the coefficients are zero, p is called the zero polynomial. The 
zero polynomial is included in P w even though its degree, for technical reasons, is not 
defined. 

If p is given by (4) and if q(?) = bo b\t -\ -+ b n t n , then the sum p + q is 

defined by 

(p + q)(0 = p(0 + q(0 

=(ao + &o) + (“i + b\)t + • • • + {a n + b n )t n 

The scalar multiple cp is the polynomial defined by 

(cp )(0 = cp (0 = cao + {ca\)t + ••• + (ca n )t n 

These definitions satisfy Axioms 1 and 6 because p + q and cp are polynomials 
of degree less than or equal to n. Axioms 2, 3, and 7-10 follow from properties of the 
real numbers. Clearly, the zero polynomial acts as the zero vector in Axiom 4. Finally, 
(—l)p acts as the negative of p, so Axiom 5 is satisfied. Thus is a vector space. 

The vector spaces P w for various n are used, for instance, in statistical trend analysis 
of data, discussed in Section 6 . 8 . ■ 


EXAMPLE 5 Let V be the set of all real-valued functions defined on a set ID. (Typi¬ 
cally, D is the set of real numbers or some interval on the real line.) Functions are added 
in the usual way: f + g is the function whose value at t in the domain D is f ⑴ + g(0- 
Likewise, for a scalar c and an f in V, the scalar multiple cf is the function whose value 
at t is ci{t). For instance, if D = M, f ⑴ =1 + sin 2“ and g(/) = 2 + .5t, then 

(f + g)(0 = 3 + sin2^ + .5t and (2g)(0 = A-\-1 

Two functions in V are equal if and only if their values are equal for every t in D. 
Hence the zero vector in V is the function that is identically zero, f(?) = 0 for all t, and 
the negative of f is (— l)f. Axioms 1 and 6 are obviously true, and the other axioms 
follow from properties of the real numbers, so K is a vector space. ■ 

It is important to think of each function in the vector space V of Example 5 as a 
single object, as just one “point” or vector in the vector space. The sum of two vectors 
f and g (functions in V, or elements of any vector space) can be visualized as in Fig. 5, 
because this can help you carry over to a general vector space the geometric intuition 
you have developed while working with the vector space W 1 . See the Study Guide for 
help as you learn to adopt this more general point of view. 
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Subspaces 

In many problems, a vector space consists of an appropriate subset of vectors from some 
larger vector space. In this case, only three of the ten vector space axioms need to be 
checked; the rest are automatically satisfied. 


A subspace of a vector space K is a subset H of V that has three properties: 

a. The zero vector of V is in H} 

b. H is closed under vector addition. That is, for each u and v in H, the sum 
u + v is in 7/. 

c. H is closed under multiplication by scalars. That is, for each u in H and each 
scalar c, the vector cu is in //. 



FIGURE 6 

A subspace of V. 


Properties (a), (b), and (c) guarantee that a subspace // of F is itself a vector 
space, under the vector space operations already defined in V. To verify this, note 
that properties (a), (b), and (c) are Axioms 1, 4, and 6. Axioms 2, 3, and 7-10 are 
automatically true in H because they apply to all elements of V, including those in H • 
Axiom 5 is also true in H, because if u is in H, then (— l)u is in H by property (c), and 
we know from equation (3) on page 191 that (—l)u is the vector —u in Axiom 5. 

So every subspace is a vector space. Conversely, every vector space is a subspace 
(of itself and possibly of other larger spaces). The term subspace is used when at least 
two vector spaces are in mind, with one inside the other, and the phrase subspace of V 
identifies V as the larger space. (See Fig. 6.) 

EXAMPLE 6 The set consisting of only the zero vector in a vector space K is a 
subspace of V, called the zero subspace and written as {0}. ■ 



FIGURE 7 

The x 1 X 2 -plane as a subspace of 
R 3 . 


EXAMPLE 7 Let P be the set of all polynomials with real coefficients, with opera¬ 
tions in P defined as for functions. Then F is a subspace of the space of all real-valued 
functions defined on R. Also, for each n > 0, P„ is a subspace of P, because P w is a 
subset of P that contains the zero polynomial, the sum of two polynomials in P„ is also 
in P w , and a scalar multiple of a polynomial in P„ is also in P„. ■ 

EXAMPLE 8 The vector space R 2 is not a subspace of M 3 because R 2 is not even a 
subset of R 3 . (The vectors in M 3 all have three entries, whereas the vectors in R 2 have 
only two.) The set 

fPI 1 

H = < t : s and t are real > 

1L0J J 

is a subset of M 3 that “looks” and “acts” like R 2 , although it is logically distinct from 
R 2 . See Fig. 7. Show that // is a subspace of R 3 . 

SOLUTION The zero vector is in H ， and H is closed under vector addition and scalar 
multiplication because these operations on vectors in H always produce vectors whose 
third entries are zero (and so belong to H). Thus 7/ is a subspace of R 3 . ■ 


2 Some texts replace property (a) in this definition by the assumption that H is nonempty. Then (a) could be 
deduced from (c) and the fact that Ou = 0. But the best way to test for a subspace is to look first for the zero 
vector. If 0 is in H, then properties (b) and (c) must be checked. If 0 is not in H , then H cannot be a 
subspace and the other properties need not be checked. 
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FIGURE 8 

A line that is not a vector space. 


X 3 



FIGURE 9 

An example of a subspace. 


THEOREM 1 


EXAMPLE 9 A plane in M 3 not through the origin is not a subspace of E 3 , because 
the plane does not contain the zero vector of R 3 . Similarly, a line in M 2 not through the 
origin, such as in Fig. 8, is not a subspace of R 2 . ■ 

A Subspace Spanned by a Set 

The next example illustrates one of the most common ways of describing a subspace. 
As in Chapter 1, the term linear combination refers to any sum of scalar multiples of 
vectors, and Span {vi,..., y^} denotes the set of all vectors that can be written as linear 
combinations of Vi,..., v^. 

EXAMPLE 10 Given Vl and \2 in a vector space V, let H = Span{vi, V 2 }. Show 
that H is a subspace of V. 

SOLUTION The zero vector is in H ， since 0 = Ovi + 0v2. To show that H is closed 
under vector addition, take two arbitrary vectors in H ， say, 

u = s\\\ + 5^2 and w = ^Vi + ^^2 

By Axioms 2, 3, and 8 for the vector space V, 

U + W = OW + S 2 \2) + OlVi + t 2 \2) 

=(A + h)Vl + (>2 + t 2 )\2 

So u + w is in H. Furthermore, if c is any scalar, then by Axioms 7 and 9 ， 

CU = C^iVi + S 2 \ 2 )= ㈣) Vi + (cs 2 )\2 

which shows that cu is in H and H is closed under scalar multiplication. Thus H is sl 
subspace of V. ■ 

In Section 4.5, you will see that every nonzero subspace of R 3 , other than R 3 itself, 
is either Span {vi, v〗} for some linearly independent Vi and \2 or Span {y} for y ^ 0. In 
the first case, the subspace is a plane through the origin; in the second case, it is a line 
through the origin. (See Fig. 9.) It is helpful to keep these geometric pictures in mind, 
even for an abstract vector space. 

The argument in Example 10 can easily be generalized to prove the following 
theorem. 


If Vi,..., are in a vector space V, then Span {vi,..., y^} is a subspace of V. 


We call Span{vi,... ,v^} the subspace spanned (or generated) by {vi , … ， v^}. 
Given any subspace // of K, a spanning (or generating) set for // is a set {vi,..., y^} 
in H such that H = Span {vi,..., y^}. 

The next example shows how to use Theorem 1. 


EXAMPLE 11 Let H be the set of all vectors of the form (a — 3b, b — a,a,b), 
where a and b are arbitrary scalars. That is, let H = {(a — 3b, b — a,a,b) : a and b in 
R}. Show that // is a subspace of R 4 . 
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SOLUTION Write the vectors in H as column vectors. Then an arbitrary vector in H 
has the form 


a — 3b 


r 


"-3" 

b — a 

a 

=a 

-1 

1 

+ b 

1 

0 

b 


0 


1 


Vl V2 

This calculation shows that// = Span {vi, V 2 }, where Vi and \2 are the vectors indicated 
above. Thus // is a subspace of R 4 by Theorem 1. ■ 

Example 11 illustrates a useful technique of expressing a subspace H as the set 
of linear combinations of some small collection of vectors. If H = Span{vi, … ， y^}, 
we can think of the vectors Vi,..., in the spanning set as “handles” that allow us to 
hold on to the subspace H • Calculations with the infinitely many vectors in H are often 
reduced to operations with the finite number of vectors in the spanning set. 

EXAMPLE 12 For what value(s) of h will y be in the subspace of R 3 spanned by 

Vi,y 2 ,V3,if 



r 


5" 


'-3' 


"-4" 

Vl = 

-1 

-2 

, v 2 = 

-4 

-7 

, v 3 = 

1 

0 

,and y = 

3 

h 


SOLUTION This question is Practice Problem 2 in Section 1.3, written here with the 
term subspace rather than Span{vi,V 2 ,V 3 }. The solution there shows that y is in 
Span{vi ， V 2 , V 3 } if and only if h = 5. That solution is worth reviewing now, along 
with Exercises 11-16 and 19-21 in Section 1.3. ■ 

Although many vector spaces in this chapter will be subspaces of , it is important 
to keep in mind that the abstract theory applies to other vector spaces as well. Vector 
spaces of functions arise in many applications, and they will receive more attention later. 

PRACTICE PROBLEMS 

1. Show that the set H of all points in M 2 of the form (3s, 2 + 5^) is not a vector space, 
by showing that it is not closed under scalar multiplication. (Find a specific vector 
u in // and a scalar c such that cu is not in H.) 

2. Let W = Span {vi,..., y^}, where Vi,..., are in a vector space V. Show that v 众 
is in W for I < k < p. [Hint: First write an equation that shows that Vi is in W. 

西四 Then adjust your notation for the general case.] 


4.1 EXERCISES 

1. Let V be the first quadrant in the xy-plane; that is, let 



a. If u and y are in K, is u + v in K? Why? 

b. Find a specific vector u in F and a specific scalar c such 


that cu is not in V. (This is enough to show that V is not 
a vector space.) 

2. Let W be the union of the first and third quadrants in the xy- 
plane. That is, let VK = | ^ : o}. 

a. If u is in IF and c is any scalar, is cu in W1 Why? 
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b. Find specific vectors u and \ in W such that u + v is 
not in W. This is enough to show that W is not a vector 
space. 


3. Let H be the set of points inside and on the unit circle in 


In Exercises 15-18, let W be the set of all vectors of the form 
shown, where a ， b, and c represent arbitrary real numbers. In 
each case, either find a set S of vectors that spans W or give an 
example to show that W is not a vector space. 


the xy-plane. That is, let // = < 

: V 

\ x 2 y 2 < . Find 

15. 

2 a + ?>b 

—1 

16. 

" 1 " 

3a — 5b 

a specific example—two vectors or a vector and a scalar—to 


2a — 5b 


3b + 2a 


show that H is not a subspace of R 2 . 

4. Construct a geometric figure that illustrates why a line in R 2 
not through the origin is not closed under vector addition. 

In Exercises 5-8, determine if the given set is a subspace of P„ for 
an appropriate value of n. Justify your answers. 

5. All polynomials of the form p(?) = at 2 , where a is in M. 

6. All polynomials of the form p(?) = a 1 2 , where a is in E. 

7. All polynomials of degree at most 3, with integers as coeffi- 


17. 


2 a — b 
3b — c 
3c — a 
3b 


18. 


4a + 3b 
0 

a + 3b + c 
3b-2c 


19. If a mass m is placed at the end of a spring, and if the mass is 
pulled downward and released, the mass-spring system will 
begin to oscillate. The displacement y of the mass from its 
resting position is given by a function of the form 


y (f) = c\ cos cot + C 2 sin cot 


(5) 


cients. 

8. All polynomials in P„ such that p(0) = 0. 

9. Let H be the set of all vectors of the form 


-It 

5t 

3t 


.Find a 


vector y in R 3 such that H = Span {y}. Why does this show 
that // is a subspace of R 3 ? 


10. Let H be the set of all vectors of the form 


3t 

0 

-It 


,where i 


is any real number. Show that // is a subspace of R 3 . (Use 
the method of Exercise 9.) 


11. Let W be the set of all vectors of the form 


2b + 3c 
—b 
2 c 

where b and c are arbitrary. Find vectors u and y such that 
W = Span {u, y}. Why does this show that W is a subspace 
ofR 3 ? 


12. Let W be the set of all vectors of the form 


2s + At 
2 s 

2 ,s — 3t 
5t 

Show that W is 3. subspace of R 4 . (Use the method of 


where is a constant that depends on the spring and the mass. 
(See the figure below.) Show that the set of all functions 
described in (5) (with co fixed and ci, Ci arbitrary) is a vector 
space. 



20. The set of all continuous real-valued functions defined on a 
closed interval [a, b] in R is denoted by C[a,b]. This set is 
a subspace of the vector space of all real-valued functions 
defined on [a,b]. 

a. What facts about continuous functions should be proved 
in order to demonstrate that C[a,b] is indeed a subspace 
as claimed? (These facts are usually discussed in a 
calculus class.) 


Exercise 11.) 



1 


2 


4 


"3" 

13. Let Vi = 

0 

,v 2 = 

1 

, V 3 = 

2 

,and w = 

1 


-1 


3 


6 


2 


a. Is win {yi,y 2 , V3}? How many vectors are in V3}? 

b. How many vectors are in Span {vi ， V 2 , V 3 }? 

c. Is w in the subspace spanned by {vi, V2, V3}? Why? 


14. Let Vi, ¥ 2 , V 3 be as in Exercise 13, and let w = 3 . Is w 

_ 14_ 

in the subspace spanned by {vi, V2, V3}? Why? 


b. Show that {f in C[a,b] : f ⑷ = f(Z?)} is a subspace of 
C[a, b]. 


For fixed positive integers m and n, the set M mXn of all m x n 
matrices is a vector space, under the usual operations of addition 
of matrices and multiplication by real scalars. 

21. Determine if the set H of all matrices of the form 
is a subspace of A/ 2 X 2 - 

22. Let F be a fixed 3x2 matrix, and let H be the set of all 
matrices A in M 2 X 4 with the property that FA = 0 (the zero 
matrix in M 3 X 4 ). Determine if // is a subspace of M 2 X 4 . 


b 

d 
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In Exercises 23 and 24, mark each statement True or False. Justify 
each answer. 

23. a. If f is a function in the vector space V of all real-valued 

functions on R and if f(^) = 0 for some t, then f is the 
zero vector in V. 

b. A vector is an arrow in three-dimensional space. 

c. A subset // of a vector space F is a subspace of V if the 
zero vector is in H. 

d. A subspace is also a vector space. 

e. Analog signals are used in the major control systems for 
the space shuttle, mentioned in the introduction to the 
chapter. 

24. a. A vector is any element of a vector space. 

b. If u is a vector in a vector space V, then (— l)u is the same 
as the negative of u. 

c. A vector space is also a subspace. 

d. R 2 is a subspace of R 3 . 

e. A subset // of a vector space F is a subspace of V if the 
following conditions are satisfied: (i) the zero vector of 
V is in //, (ii) u, v, and u + v are in N, and (iii) c is a 
scalar and cu is in //. 


Exercises 25-29 show how the axioms for a vector space V can 
be used to prove the elementary properties described after the 
definition of a vector space. Fill in the blanks with the appropriate 
axiom numbers. Because of Axiom 2, Axioms 4 and 5 imply, 
respectively, that 0 + u = u and — u + u = 0 for all u. 

25. Complete the following proof that the zero vector is 

unique. Suppose that w in F has the property that 
u + w = w + u = u for all u in V. In particular, 0 + w = 0. 
But 0 + w = w, by Axiom_Hence w = 0 + w = 0. 

26. Complete the following proof that — u is the unique vector 
in V such that u + (— u) = 0. Suppose that w satisfies 
u + w = 0. Adding — u to both sides, we have 


(-U) + [u + w] 

=(-U) + 0 



[(-u) + u] + w 

=(-U) + 0 

by Axiom 

⑻ 

0 + w 

=(-u) + 0 

by Axiom 

(b) 

w 

二—— U 

by Axiom 

(c) 


27. Fill in the missing axiom numbers in the following proof that 
Ou = 0 for every u in V. 


0u = (0 + 0)u = Ou + Ou 

by Axiom 

⑻ 

Add the negative of Ou to both sides: 



Ou + (—0u) = [Ou + Ou] + (—Ou) 



Ou + (—Ou) = Ou + [Ou + (—0u)] 

by Axiom 

(b) 

0 = Ou + 0 

by Axiom 

(c) 

0 = Ou 

by Axiom 

⑹ 


28. Fill in the missing axiom numbers in the following proof that 


cO = 0 for every scalar c. 
cO = c(0 + 0) 

=cO + cO 

Add the negative of cO to both sides: 
cO + (_c0)=[c0 + cO] + (—cO) 
cO + (—c0)=c0 + [cO + (—cO)] 

0=c0 + 0 
0=c0 


by Axiom. 
by Axiom. 


•⑻ 

■⑻ 


by Axiom 

by Axiom 

(c) 

(d) 

by Axiom 

⑹ 


29. Prove that (— l)u = — u. [Hint: Show that u + (—l)u = 
Use some axioms and the results of Exercises 27 and 26.] 


0 . 


30. Suppose cu = 0 for some nonzero scalar c. Show that u = 0. 
Mention the axioms or properties you use. 

31. Let u and y be vectors in a vector space V, and let H be any 
subspace of V that contains both u and v. Explain why H 
also contains Span {u, y}. This shows that Span {u, v} is the 
smallest subspace of V that contains both u and y. 

32. Let H and K be subspaces of a vector space V. The 
intersection of H and K, written as H D K, is the set of 
y in K that belong to both H and K. Show that H f) K is 
a subspace of V. (See the figure.) Give an example in R 2 
to show that the union of two subspaces is not, in general, a 
subspace. 



33. Given subspaces H and 尺 of a vector space V, the sum of 
H and K, written as H K, is the set of all vectors in V 
that can be written as the sum of two vectors, one in H and 
the other in K; that is, 

H-\-K = {w：yv = u + \ for some u in H 
and some y in K} 

a. Show that // + AT is a subspace of V. 

b. Show that H is d. subspace of H K and is a subspace 
ofH K. 

34. Suppose ui,... ， and vi ， ... ， v 分 are vectors in a vector 
space V, and let 

H = Span{ui，. •. ， u^} and K = Span{vi,..., v《} 

Show that H + K = Span {ui,..., u p , Vi,..., v^}. 
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35. [M] Show that w is in the subspace of M 4 spanned by 
Vi ， V 2 , v 3 , where 



9 


8 


-4 


-7 

w = 

-4 

-4 

，Vl = 

-4 

-3 

， V2 = 

3 

-2 

， v 3 = 

6 

一 5 


7 


9 


-8 


-18 


36. [M] Determine if y is in the subspace of R 4 spanned by the 
columns of A, where 


"-4" 


3 

-5 

-9" 

-8 

， A = 

8 

7 

-6 

6 

-5 

-8 

3 

-5 


2 

-2 

-9 


37. [M] The vector space H = Span {1, cos 2 cos 4 cos 6 ?} 
contains at least two interesting functions that will be used 


in a later exercise: 

f(0 = 1 — 8 cos 2 f + 8 cos 4 1 

g{t) = —1 + 18 cos 2 1 — 48 cos 4 1 + 32 cos 6 1 

Study the graph of f for 0 < ^ < 2jt, and guess a simple for¬ 
mula for f(/). Verify your conjecture by graphing the differ¬ 
ence between 1 + f(t) and your formula for f(Y). (Hopefully, 
you will see the constant function 1.) Repeat for g. 

38. [M] Repeat Exercise 37 for the functions 
f(，）= 3 sin r — 4 sin 3 1 
g(t) =1 — 8 sin 2 f + 8 sin 4 1 
h(f) = 5 sin r — 20 sin 3 1 + 16 sin 5 1 

in the vector space Span {1，sin t, sin 2 1, , sin 5 1}. 


SOLUTIONS TO PRACTICE PROBLEMS 


1. Take any u in H—say, u 


-and take any c ^ 1—say, c = 2. Then 


cu 


6 

14 


.If this is in H, then there is some s such that 


3^ 


6 " 

_2-h 5s _ 


14 


That is, = 2 and s = 12/5, which is impossible. So 2u is not in H and H is not a 
vector space. 

2. Vi = lvi + 0v2 + • • • + OVp. This expresses Vi as a linear combination of 
Vi, ... ,\ p , so Vi is in W. In general, is in W because 

= Ovi H - h 0y^_i + l\k + Ov^+i H - h 0v^ 


4.2 NULL SPACES, COLUMN SPACES, AND LINEAR TRANSFORMATIONS 


In applications of linear algebra, subspaces of W 1 usually arise in one of two ways: (1) as 
the set of all solutions to a system of homogeneous linear equations or (2) as the set 
of all linear combinations of certain specified vectors. In this section, we compare and 
contrast these two descriptions of subspaces, allowing us to practice using the concept of 
a subspace. Actually, as you will soon discover, we have been working with subspaces 
ever since Section 1.3. The main new feature here is the terminology. The section 
concludes with a discussion of the kernel and range of a linear transformation. 


The Null Space of a Matrix 

Consider the following system of homogeneous equations: 


X\ — 3X2 — 2^3 = 0 

—5x\ + 9^2 + ^3 = 0 


⑴ 
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DEFINITION 


THEOREM 2 


In matrix form, this system is written as Ax = 0, where 


A = 



-3-2 
9 1 


⑵ 


Recall that the set of all x that satisfy (1) is called the solution set of the system (1). 
Often it is convenient to relate this set directly to the matrix A and the equation Ax = 0. 
We call the set of x that satisfy Ax = 0 the null space of the matrix A. 


The null space of an m x « matrix A, written as Nul A, is the set of all solutions 
of the homogeneous equation Ax = 0. In set notation, 

Nul A = {x:xisinR n and Ax = 0} 


A more dynamic description of Nul A is the set of all x in W 1 that are mapped into 
the zero vector of R m via the linear transformation x Ax. See Fig. 1. 



EXAMPLE 1 Let A be the matrix in (2) above, and let u = 
u belongs to the null space of A. 


3 . Determine if 
-2 


SOLUTION To test if u satisfies Au = 0, simply compute 


Au = 


1 -3 -2" 

5" 


■ 5 - 9 + 4' 


■0_ 

-5 9 1 

-2 


-25 + 27-2 


0 


Thus u is in NuM. ■ 

The term space in null space is appropriate because the null space of a matrix is a 
vector space, as shown in the next theorem. 


The null space of an m x n matrix 乂 is a subspace of W 1 . Equivalently, the 
set of all solutions to a system .4x = 0 of m homogeneous linear equations in 
n unknowns is a subspace of W 1 . 
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PROOF Certainly Nul ^4 is a subset of W 1 because A has n columns. We must show 
that Nul A satisfies the three properties of a subspace. Of course, 0 is in Nul A. Next, 
let u and v represent any two vectors in Nul A Then 

An = 0 and ^4v = 0 

To show that u + y is in Nul^4, we must show that ^4(u + v) = 0. Using a property of 
matrix multiplication, compute 

^4(u + v) = ^4u + ^4v = 0 + 0 = 0 

Thus u + v is in Nul A, and Nul A is closed under vector addition. Finally, if c is any 
scalar, then 

A(cu) = c(Au) = c(0) = 0 

which shows that cu is in Nul A. Thus Nul ^4 is a subspace of W 1 . ■ 

EXAMPLE 2 Let H be the set of all vectors in R 4 whose coordinates a, b, c, d 
satisfy the equations a — 2b -\- 5c = d and c — a = b. Show that // is a subspace of 
R 4 . 

SOLUTION Rearrange the equations that describe the elements of H, and note that H 
is the set of all solutions of the following system of homogeneous linear equations: 

a — 2b-\-5c — d = 0 
—a — b + c =0 

By Theorem 2, // is a subspace of M 4 . ■ 

It is important that the linear equations defining the set H are homogeneous. 
Otherwise, the set of solutions will definitely not be a subspace (because the zero vector 
is not a solution of a nonhomogeneous system). Also, in some cases, the set of solutions 
could be empty. 

An Explicit Description of Nul ^4 

There is no obvious relation between vectors in Nul A and the entries in A. We say that 
Nul j is defined implicitly, because it is defined by a condition that must be checked. 
No explicit list or description of the elements in Nul J is given. However, solving 
the equation Ax = 0 amounts to producing an explicit description of Nul A. The next 
example reviews the procedure from Section 1.5. 

EXAMPLE 3 Find a spanning set for the null space of the matrix 

'-3 6-1 1 -7" 

A = 1 -2 2 3 -1 

2 -4 5 8 -4_ 

SOLUTION The first step is to find the general solution of Ax = 0 in terms of free 
variables. Row reduce the augmented matrix [ A 0 ] to reduced echelon form in order 
to write the basic variables in terms of the free variables: 

'1 -2 0-1 3 0 

0 0 12-20 

0 0 0 0 0 0 


X\ — 2X2 — X4 3X5 = 0 

X3 + 2 X 4 — 2X5 = 0 
0 = 0 
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DEFINITION 


THEOREM 3 


The general solution is x\ = 2 x 2 + ^4 — 3xs, X 3 = — 2 x 4 + 2 x 5 , with X 2 , X 4 , and X 5 
free. Next, decompose the vector giving the general solution into a linear combination 
of vectors where the weights are the free variables. That is, 


■^1 " 


2 X 2 + X4 — 3^5 


2 


1 


-3 





1 


0 


0 

X 3 

= 

— 2义4 + 2^5 

= X 2 

0 

+ X4 

-2 

+ x 5 

2 

X 4 


X 4 


0 


1 


0 

x 5 - 


X 5 


0 


0 


1 


u v w 

=X2U + X4\ + X5W 


( 3 ) 


Every linear combination of u, y, and w is an element of Nul A. Thus {u, y, w} is a 
spanning set for Nul A. ■ 


Two points should be made about the solution of Example 3 that apply to all 
problems of this type where Nul A contains nonzero vectors. We will use these facts 
later. 


1. The spanning set produced by the method in Example 3 is automatically linearly 
independent because the free variables are the weights on the spanning vectors. For 
instance, look at the 2nd, 4th, and 5th entries in the solution vector in (3) and note 
that X 2 U + x^\ + X 5 W can be 0 only if the weights X 2 , X 4 , and X 5 are all zero. 

2. When Nul A contains nonzero vectors, the number of vectors in the spanning set for 
Nul A equals the number of free variables in the equation Ax = 0. 


The Column Space of a Matrix 

Another important subspace associated with a matrix is its column space. Unlike the 
null space, the column space is defined explicitly via linear combinations. 


The column space of an m x « matrix A, written as Col A, is the set of all linear 
combinations of the columns of A. If A = [st\ … a w ], then 

Col A = Span {ai,..., a„} 


Since Span {a!, … ， a„} is a subspace, by Theorem 1 ， the next theorem follows from 
the definition of Col A and the fact that the columns of A are in W n . 


The column space of an m x « matrix ^4 is a subspace of R m . 


Note that a typical vector in Col A can be written as Ax for some x because the 
notation Ax stands for a linear combination of the columns of A. That is, 


Col ^4 = {b : b = >lx for some x in } 


The notation Ax for vectors in Col A also shows that Col A is the range of the linear 
transformation x Ax. We will return to this point of view at the end of the section. 
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6 — 1 

Second, use the vectors in the spanning set as the columns of A. Let A = 1 1 

_-7 0_ 

Then W = Col A, as desired. ■ 

Recall from Theorem 4 in Section 1.4 that the columns of A span if and only if 
the equation Ax = b has a solution for each b. We can restate this fact as follows: 


The column space of an m x « matrix A is all of W n if and only if the equation 
^4x = b has a solution for each b in 


The Contrast Between Nul^4 and Col Z 


It is natural to wonder how the null space and column space of a matrix are related. In 
fact, the two spaces are quite dissimilar, as Examples 5-7 will show. Nevertheless, 
a surprising connection between the null space and column space will emerge in 
Section 4.6, after more theory is available. 


EXAMPLE 5 Let 


2 4-2 

A= -2-5 7 

3 7-8 


1 

3 

6 


a. If the column space of ^ is a subspace of R' what is kl 

b. If the null space of ^4 is a subspace of R 灸 ， what is kl 


SOLUTION 

a. The columns of A each have three entries, so Col ^4 is a subspace of where 
k = 3. 

b. A vector x such that Ax is defined must have four entries, so Nul A is a. subspace of 

R k , where k = 4. ■ 

When a matrix is not square, as in Example 5, the vectors in Nul ^4 and Col A live 
in entirely different “universes.” For example, no linear combination of vectors in R 3 
can produce a vector in R 4 . When A is square, Nul A and Col A do have the zero vector 
in common, and in special cases it is possible that some nonzero vectors belong to both 
Nul A and Col A. 


















4.2 Null Spaces, Column Spaces, and Linear Transformations 203 


EXAMPLE 6 Withal as in Example 5, find a nonzero vector in Col A and a nonzero 
vector in Nul A 

2 " 

SOLUTION It is easy to find a vector in Col A. Any column of A will do, say, —2 . 

3 一 

To find a nonzero vector in Nul A, row reduce the augmented matrix [ A 01 and obtain 


Au 


2 4-2 r 
-2-573 


3 

-2 

1 


0" 

-3 

一 

'O' 

0 

37-86 


— i 

0 


3 


0 


Thus, if x satisfies Ax = 0, then x\ = — 9 x 3 , ^2 = 5x3, X4 = 0, and X 3 is free. As¬ 
signing a nonzero value to X 3 —say, X 3 = 1—we obtain a vector in Nul A, namely, 
x= (-9,5,1 ， 0). 

3 

EXAMPLE 7 With A as in Example 5, let u = \ 

0 

a. Determine if u is in Nul A. Could u be in Col A1 

b. Determine if v is in Col A. Could y be in Nul A1 

SOLUTION 

a. An explicit description of Nul A is not needed here. Simply compute the product 
v4u. 



" 3 " 

and v = 

-1 


3 


Obviously, u is not di solution of Ax = 0, so u is not in Nul A. Also, with four entries, 
u could not possibly be in Col A, since Col ^4 is a subspace of M 3 . 
b. Reduce [ A y ] to an echelon form. 



2 

4 

-2 

1 

3" 


"2 

4 

-2 

1 

3" 


-2 

-5 

7 

3 

-1 

〜 

0 

1 

-5 

-4 

-2 


3 

7 

-8 

6 

3 


0 

0 

0 

17 

1 


At this point, it is clear that the equation Ax = y is consistent, so y is in Col A. With 
only three entries, v could not possibly be in Nul A, since Nul ^4 is a subspace of 
R 4 . ■ 

The table on page 204 summarizes what we have learned about Nul A and Col A. 
Item 8 is a restatement of Theorems 11 and 12(a) in Section 1.9. 

Kernel and Range of a Linear Transformation 

Subspaces of vector spaces other than are often described in terms of a linear 
transformation instead of a matrix. To make this precise, we generalize the definition 
given in Section 1.8. 


9 5 0 


o 
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DEFINITION 


Contrast Between Nul A and Col A for an m x n Matrix A 

NuM Col A 


1. Nul ^4 is a subspace of R”. 

2. Nul A is implicitly defined; that is, you are 
given only a condition (Ax = 0) that vec¬ 
tors in Nul A must satisfy. 

3. It takes time to find vectors in Nul A. Row 
operations on [ A 0 ] are required. 

4. There is no obvious relation between Nul A 
and the entries in A. 

5. A typical vector y in Nul A has the property 
that A\ = 0. 

6. Given a specific vector y, it is easy to tell if 
y is in Nul A. Just compute Ay. 

7. Nul A = {0} if and only if the equation 
Ax = 0 has only the trivial solution. 

8. Nul^4 = {0} if and only if the linear trans¬ 
formation x Ax is one-to-one. 


1. Col A is a. subspace of R m . 

2. Col A is explicitly defined; that is, you are 
told how to build vectors in Col A. 

3. It is easy to find vectors in Col A. The 
columns of A are displayed; others are 
formed from them. 

4. There is an obvious relation between Col A 
and the entries in A, since each column of 
A is in Col A. 

5. A typical vector y in Col A has the property 
that the equation Ax = y is consistent. 

6 . Given a specific vector y, it may take time 
to tell if y is in Col A. Row operations on 
[A y ] are required. 

7. Col A = R m if and only if the equation 
Ax = b has a solution for every b in R m . 

8 . Col A = if and only if the linear trans¬ 
formation x Ax maps W 1 onto R m . 


A linear transformation T from a vector space V into a vector space is a rule 
that assigns to each vector x in F a unique vector T (x) in W, such that 

(i) r(u + v) = 7 (u) + T(y) for all u, v in V, and 

(ii) T(cu) = c T (u) for all u in K and all scalars c. 


The kernel (or null space) of such a 7" is the set of all u in K such that T (u) = 0 
(the zero vector in W). The range of T is the set of all vectors in W of the form T (x) 
for some x in V. If T happens to arise as a matrix transformation — say, 7"(x) = Ax 
for some matrix ^4—then the kernel and the range of T are just the null space and the 
column space of A, as defined earlier. 

It is not difficult to show that the kernel of T is a subspace of V. The proof is 
essentially the same as the one for Theorem 2. Also, the range of T is a subspace of W. 
See Fig. 2 and Exercise 30. 



FIGURE 2 Subspaces associated with 
a linear transformation. 


In applications, a subspace usually arises as either the kernel or the range of an 
appropriate linear transformation. For instance, the set of all solutions of a homoge¬ 
neous linear differential equation turns out to be the kernel of a linear transformation. 
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Typically, such a linear transformation is described in terms of one or more derivatives 
of a function. To explain this in any detail would take us too far afield at this point. So 
we consider only two examples. The first explains why the operation of differentiation 
is a linear transformation. 


EXAMPLE 8 (Calculus required) Let V be the vector space of all real-valued func¬ 
tions / defined on an interval [a, b] with the property that they are differentiable and 
their derivatives are continuous functions on [a, b]. Let W be the vector space C[a, b] 
of all continuous functions on [a, b], and \Qt D : V ^ W bQ the transformation that 
changes f in V into its derivative /’.In calculus, two simple differentiation rules are 

D{f + g) = D(f) + D(g) and D(cf) = cD(f) 

That is, Z) is a linear transformation. It can be shown that the kernel of D is the set of 
constant functions on [a , b] and the range of D is the set W of all continuous functions 
on [a, b]. ■ 


EXAMPLE 9 (Calculus required) The differential equation 

y" + (o 2 y = 0 (4) 

where a; is a constant, is used to describe a variety of physical systems, such as the 
vibration of a weighted spring, the movement of a pendulum, and the voltage in an 
inductance-capacitance electrical circuit. The set of solutions of (4) is precisely the 
kernel of the linear transformation that maps a function y = f (t) into the function 
f rr {t) + co 2 f (t). Finding an explicit description of this vector space is a problem in 
differential equations. The solution set turns out to be the space described in Exercise 19 
in Section 4.1. ■ 


PRACTICE PROBLEMS 


Let W 


:a — 3b — c = 0>. Show in two different ways that W is sl 


subspace of M 3 . (Use two theorems.) 


2 . 


Let A = 


7-3 5" 

-4 1 -5 

,v = 

2" 

1 

,and w = 

7" 

6 

.Suppose you know that 

-5 2-4 


-1 


-3 



the equations Ax = v and ^4x = w are both consistent. What can you say about the 
equation ^4x = v + w? 


4.2 EXERCISES 



r 



_ r 


1. Determine if w = 

3 

is in Nul A, where 

2. Determine if w = 

-i 

is in Nul A, where 


-4 



i 



3 

一 5 

-3" 


2 

6 

4" 

6 

-2 

0 

• A = 

-3 

2 

5 

-8 

4 

1 


-5 

-4 

1 





















206 CHAPTER 4 Vector Spaces 


In Exercises 3-6, find an explicit description of Nul A, by listing 
vectors that span the null space. 


3. A 

4. A : 


1 -3 
0 0 


21. With A as in Exercise 17, find a nonzero vector in Nul A and 
a nonzero vector in Col A. 

22. With A as in Exercise 18, find a nonzero vector in Nul A and 
a nonzero vector in Col A. 


23. Let 乂 


and w 




"1 

-4 

0 

2 

0 " 

Col A. Is 

w in NuM? 




5 . 

A = 

0 

0 

1 

-5 

0 


"10 

-8 - 

-2 

- 2 ' 


" 2 " 



_0 

0 

0 

0 

2 _ 


0 

2 

2 

-2 


2 








24. Let A = 

and w = 



"1 

3 

-4 

-3 

r 


1 

-1 

6 

0 


0 

6 . 

A = 

0 

1 

-3 

1 

0 


1 

1 

0 

- 2 _ 


_ 2 _ 



0 

0 

0 

0 

0 

if w is in Col A. 

Is w in Nul A1 



Determine if w is in 


.Determine 


In Exercises 7-14, either use an appropriate theorem to show that 
the given set, W, is a vector space, or find a specific example to 
the contrary. 

7 - If? 


a + Z? + c = 2 


8 . 


: 3r — 2 = 3s 1 . 



P 



q 

p — 3q = 4s 


r 

s 

' 2p = s 5r 



a 



b 

3a b = c 


c 

_d _ 

'a b 2c = 2d 




s — 2t 

' 



3p — 5q 

11. 


3 + 3*y 
3*y -h t 

: 5 , t real 

12. 


切 

P 



2s 




_ ^ + 1 _ 


r 

c — 6d 




~ —s 3t~ 

13. 


d 

: c, d real 

14- 


s — 2/ 



c 




5s _ t 


: p, q real 


: s, t real 


In Exercises 15 and 16, find A such that the given set is Col A. 


15. 


16. 



2s 1 

' 


r — s -\-2t 

: r, s, t real 


3r + 5 


2r — s — t 



b — c 
2b-\-3d 
b -j- 3c — 3d 
c -\- d 


: b, c, d real 


For the matrices in Exercises 17-20, (a) find k such that Nul A is 
a subspace of ， and (b) find k such that Col ^4 is a subspace of 
R k . 


17. 

19. 

20 . 


5-2 3 

-1 0 -1 
0 -2 -2 
-5 7 2 

[4 5 -2 6 0" 

^ = [ 1 1 0 1 o_ 

^ = [ 1 -3 2 0 -5] 


6 

-4 


-3 

-9 

2 

18. A = 

6 

9 

—6 



In Exercises 25 and 26, A denotes an m x n matrix. Mark each 
statement True or False. Justify each answer. 

25. a. The null space of A is the solution set of the equation 

Ax = 0. 

b. The null space of an m x n matrix is in 

c. The column space of A is the range of the mapping 
x \-^Ax. 

d. If the equation = b is consistent, then Col A is 

e. The kernel of a linear transformation is a vector space. 

f. Col A is the set of all vectors that can be written as Ax for 
some x. 

26. a. A null space is a vector space. 

b. The column space of an m x « matrix is in M m . 

c. Col A is the set of all solutions of Ax = b. 

d. Nul A is the kernel of the mapping x i-^- Ax. 

e. The range of a linear transformation is a vector space. 

f. The set of all solutions of a homogeneous linear differen¬ 
tial equation is the kernel of a linear transformation. 

27. It can be shown that a solution of the system below is Xi = 3, 
X 2 = 2, and X 3 = —1. Use this fact and the theory from this 
section to explain why another solution is X\ = 30, Xi = 20, 
and X 3 = —10. (Observe how the solutions are related, but 
make no other calculations.) 

X\ _ 3^2 _ 3^3 = 0 

_ 2xi + 4 义 2 + 2^3 = 0 

— X\ -j- 5x2 7x3 = 0 

28. Consider the following two systems of equations: 

5xi + X 2 _ 3^3 — 0 5 又 1 + X2 — 3^3 = 0 

— + 2x2 + 5x3 — 1 — 9xi + 2x2 + 5^3 — 5 

4-x\ ~h X 2 _ 6 x 3 = 9 Ax\ X 2 — 6 x 3 = 45 

It can be shown that the first system has a solution. Use 
this fact and the theory from this section to explain why the 
second system must also have a solution. (Make no row 
operations.) 
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32. Define 

r(P)= 


span the kernel of T, and describe 


34. (Calculusrequired )Define T : C[0 
For f in C[0,1], let r(f) be the a 
that F(0) = 0. Show that T is a 1 
describe the kernel of T. (See the i 
Section 4.1.) 


d. Describe the kernel of T. 


33. Let A/ 2 x 2 be the vector space 
and define T : M 2 x 2 — 从 2 x 2 by 
’a 

c d 


A 


a. Show that r is a linear transfor 

b. Let B be any element of 

an ^4 in M 2 X 2 such that T (^4)= 

c. Show that the range of T is the 


property that B 1 


B. 


a. Show that T is a linear transformation. [Hint: For 

arbitrary polynomials p, q in P 2 , compute r(p + q) and 
r(cp).] w = 

b. Find a polynomial p in P 2 that spans the kernel of T, and 
describe the range of T. 


29. Prove Theorem 3 as follows: Given an m x n matnx A, an 
element in Col A has the form Ax for some x in R n . Let Ax 
and Aw represent any two vectors in Col A. 

a. Explain why the zero vector is in Col A. 

b. Show that the vector Ax + Aw is in Col A. 

c. Given a scalar c, show that c{As) is in Col A. 

30. Let T : V —> IV be a linear transformation from a vector 
space V into a vector space W. Prove that the range of T 
is a subspace of W. [Hint: Typical elements of the range 
have the form T (x) and T (w) for some x, w in V.] 


31. Define T : P 2 — M 2 by T(p) 


p(t) = 3 + 5t + It 2 , then T(p) : 


P ⑼ 
P ⑴ 

3' 

15 


For instance, if 


35. Let V and W be vector spaces, and let 7 : V —^ py bea linear 
transformation. Given a subspace C/ of K, let T(U) denote 
the set of all images of the form T (x), where x is in U. Show 
that T(U) is a subspace of W. 

36. Given T : K — VF as in Exercise 35, and given a subspace 
Z of W, let U be the set of all x in K such that 7"(x) is in Z. 
Show that U is a. subspace of V. 

37. [M] Determine whether w is in the column space of A, the 
null space of A, or both, where 

7 6-4 r 

-5 -1 0-2 

9 -11 7-3 

-3 」 |_ 19 -9 7 1 

38. [M] Determine whether w is in the column space of A, the 
null space of 乂 ， or both, where 


A 


ion T : P 2 ^ R 2 by 
als pj and p 2 in P 2 that 

the range of T. 

of all 2 x 2 matrices, 
T(A) = A -A T , where 

mation. 

such that B t = B. Find 
-B. 

set of B in M 2 X 2 with the 


1 ,1] —>• C[0,1] as follows: 
ntiderivative F of f such 
inear transformation, and 
lotation in Exercise 20 of 


39. [M] Let ai, ..., as denote the columns of the matrix A, 
where 


B = [?i\ a 2 a 4 ] 


a. Explain why 83 and as are in the column space of B . 

b. Find a set of vectors that spans Nul A. 

c. Let T : R 5 — 脱 4 be defined by T (x) = Ax. Explain why 
T is neither one-to-one nor onto. 

[M] Let H = Span {y 1 , y 2 } and K = Span {V 3 ,V 4 }, where 



5 


1 


2 


0 

Vl = 

3 

8 

,v 2 = 

3 

4 

， v 3 = 

-1 

5 

， V 4 = 

-12 

-28 


Then H and K are subspaces of R 3 . In fact, H and 
K are planes in R 3 through the origin, and they intersect 
in a line through 0. Find a nonzero vector w that gen¬ 
erates that line. [Hint: w can be written as ciVi + C 2 V 2 
and also as C 3 V 3 + C 4 V 4 . To build w, solve the equation 
CiVi + c 2 \2 = c 3 y 3 + c 4 y 4 for the unknown Cy’s.] 

Mastering: Vector Space, Subspace, 

CoM, and Nul >A 4-6 


SOLUTIONS TO PRACTICE PROBLEMS 


1 


-8 

5 

-2 

0 

2 

， A = 

-5 

2 

1 

-2 

1 

10 

-8 

6 

-3 

0 


3 

-2 

1 

0 


5 12 2 0 

3 3 2 -1 -12 

8 4 4 -5 12 

21 10-2 


SG 


ar 

ne 


yno 

Pol 

nd 

Fin 


0 ) 0 ) 

rv /IV 

p p 


1. First method: is a subspace of R 3 by Theorem 2 because W is the set of all solu¬ 
tions to a system of homogeneous linear equations (where the system has only one 
equation). Equivalently, W is the null space of the 1 x 3 matrix ^4 = [ 1 —3 —1 ]. 
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Second method: Solve the equation a — 3b — c = 0 for the leading variable a in 

3b c 

terms of the free variables b and c. Any solution has the form b , where b 


and c are arbitrary, and 


3b c 
b 
c 

Vl V 2 

This calculation shows that W = Span {vi, V 2 }- Thus W is 3. subspace of R 3 by 
Theorem 1. We could also solve the equation a — 3b — c = 0 for b or c and get 
alternative descriptions of as a set of linear combinations of two vectors. 

2. Both y and w are in Col A. Since Col d is a vector space, y + w must be in Col A. 
That is, the equation Ax = y + w is consistent. 



3 


1 

=b 

1 

+ c 

0 


0 


1 


4.3 LINEARLY INDEPENDENT SETS ； BASES 


In this section we identify and study the subsets that span a vector space F or a subspace 
H as “efficiently” as possible. The key idea is that of linear independence, defined as 
in W\ 

An indexed set of vectors {vi,..., y^} in V is said to be linearly independent if 
the vector equation 

C\\\ + c 2 \2 H - + C p \ p = 0 (1) 

has only the trivial solution, c\ = 0 ,... ,c p = 0. 1 

The set {vi ，…， 〜} is said to be linearly dependent if (1) has a nontrivial solution, 
that is, if there are some weights, C\,... ,c p , not all zero, such that (1) holds. In such a 
case, (1) is called a linear dependence relation among Vi,... ， v 户 . 

Just as in R n , a set containing a single vector v is linearly independent if and only if 
v 0. Also, a set of two vectors is linearly dependent if and only if one of the vectors 
is a multiple of the other. And any set containing the zero vector is linearly dependent. 
The following theorem has the same proof as Theorem 7 in Section 1.7. 

THEOREM 4 An indexed set {vi, … ,y^} of two or more vectors, with Vi ^0, is linearly 
dependent if and only if some \j (with j > 1) is a linear combination of the 
preceding vectors, Vi,..., Vy-i. 

The main difference between linear dependence in W l and in a general vector space 
is that when the vectors are not n-tuples, the homogeneous equation (1) usually cannot 
be written as a system of n linear equations. That is, the vectors cannot be made into 
the columns of a matrix A in order to study the equation Ax = 0. We must rely instead 
on the definition of linear dependence and on Theorem 4. 

EXAMPLE 1 Let Pi(t) = 1, p 2 (0 = t, and p 3 ⑴ =A — t. Then {p 1 ,p 2 ,p 3 } is 
linearly dependent in F because p 3 = 4pj — p 2 . ■ 


1 It is convenient to use c\,... ,c p in (1) for the scalars instead of %i,..., %n, as we did in Chapter 1. 
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DEFINITION 


e 3 



产 

FIGURE 1 

The standard basis for R 3 . 


EXAMPLE 2 The set {sint, cos t} is linearly independent in C[0,1]，the space of 
all continuous functions onO < t < 1 , because sin? and cos t are not multiples of one 
another as vectors in C[0,1]. That is, there is no scalar c such that cos t = c • sin f for 
all t in [0,1]. (Look at the graphs of sin t and cos t.) However, {sin t cos t, sin2^} is 
linearly dependent because of the identity: sin 2t = 2 sin t cos t, for all t. ■ 


Let // be a subspace of a vector space V. An indexed set of vectors 
jB = {b\,... ,b^} in F is a basis for H if 

(i) ^ is a linearly independent set, and 

(ii) the subspace spanned by B coincides with H; that is, 

H = Span {bi,..., b^} 


The definition of a basis applies to the case when H = V, because any vector space 
is a subspace of itself. Thus a basis of K is a linearly independent set that spans V. 
Observe that when H ^ V, condition (ii) includes the requirement that each of the 
vectors bi, ..., must belong to H, because Span {bi , …， b^} contains bi,... ,b p , 
as shown in Section 4.1. 


EXAMPLE 3 Let A be an invertible n x n matrix—say, A = [a\ ... a„ ]. Then 
the columns of A form a basis for W 1 because they are linearly independent and they 
span W 1 , by the Invertible Matrix Theorem. ■ 


EXAMPLE 4 Let d,... ,e„ be the columns of the n x n identity matrix, I n . That 
is, 


ei = 

"1" 

0 

,e 2 = 

"0" 

1 

, • • •, = 

"0" 


_0_ 


_0_ 


0 

_1_ 


The set {ei,... ,e„} is called the standard basis for R n (Fig. 1). ■ 


EXAMPLE 5 Let vi = 

3" 

0 

,V 2 = 

"-4" 

1 

7 

,and v 3 = 

~-2~ 

1 

.Determine if 

{vi, V 2 , V 3 } is a basis for R 3 . 

_0 


/ 





SOLUTION Since there are exactly three vectors here in R 3 , we can use any of several 
methods to determine if the matrix A = [vi \2 V 3 ] is invertible. For instance, two 
row replacements reveal that A has three pivot positions. Thus A is invertible. As in 
Example 3, the columns of A form a basis for R 3 . ■ 

EXAMPLE 6 Let S = {l,t,t 2 ,..., t n ). Verify that *S is a basis for P„. This basis 
is called the standard basis for P„. 

SOLUTION Certainly S spans F„. To show that S is linearly independent, suppose that 
Co, ..., satisfy 

Co * 1 + C\t + C 2 ^ 2 + ••• + c n t n = 0(^) (2) 

This equality means that the polynomial on the left has the same values as the zero poly¬ 
nomial on the right. A fundamental theorem in algebra says that the only polynomial 
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= t- 



FIGURE 2 

The standard basis for P2. 


x i 



THEOREM 5 


in F /2 with more than n zeros is the zero polynomial. That is, equation (2) holds for all 
t only if co = • • • = =0. This proves that S is linearly independent and hence is a 

basis for P„. See Fig. 2. ■ 

Problems involving linear independence and spanning in P„ are handled best by a 
technique to be discussed in Section 4.4. 


The Spanning Set Theorem 

As we will see, a basis is an “efficient” spanning set that contains no unnecessary 
vectors. In fact, a basis can be constructed from a spanning set by discarding unneeded 
vectors. 


EXAMPLE 7 Let 



0" 


"2" 


6" 

Vl = 

2 

, V 2 = 

2 

, v 3 = 

16 


-1 


0 


-5 


and H = Span{vi, V 2 , v〗}. 


Note that V 3 = 5\\ + 3 v 2 , and show that Span{vi,V 2 , V 3 } = Span {vi, V 2 }- Then find a 
basis for the subspace H. 


SOLUTION Every vector in Span{vi, V 2 } belongs to H because 


C1V1 + c 2 \2 = c\\\ + c 2 \2 + 0v 3 


Now let x be any vector in // — say, x = ciVi + C 2\2 + C 3 V 3 . Since V 3 = 5\\ + 3 v 2 , 
we may substitute 

x = CiYi + c 2 \2 + c* 3 (5yi + 3v 2 ) 

= (Cl + 5c 3 )vi + (c 2 + 3c 3 )v 2 

Thus x is in Span{vi,V 2 }, so every vector in H already belongs to Span{vi, V 2 }. We 
conclude that H and Span {vi, V 2 } are actually the same set of vectors. It follows that 
{vi, V 2 } is a basis of H since {vi, V 2 } is obviously linearly independent. ■ 


The next theorem generalizes Example 7. 


The Spanning Set Theorem 

Let S = , y^} be a set in V, and let H = Span {vi,..., v^}. 

a. If one of the vectors in S —say, y^—is a linear combination of the remaining 
vectors in S, then the set formed from S by removing \k still spans H. 

b. If H ^ {0}, some subset of is a basis for H. 

PROOF 

a. By rearranging the list of vectors in S, if necessary, we may suppose that is a 
linear combination of Vi,..., — say, 

\ P = a x \\ H - h a p -i\ p -i (3) 

Given any x in H, we may write 

x = C 1 V 1 H - h c p -\\ p -\ + C p \ p (4) 

for suitable scalars C\,..., c p . Substituting the expression for \ p from (3) into (4), 
it is easy to see that x is a linear combination of Vi,..., v^-i. Thus {vi,..., y^-i} 
spans H, because x was an arbitrary element of H • 
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b. If the original spanning set S is linearly independent, then it is already a basis for H. 
Otherwise, one of the vectors in S depends on the others and can be deleted, by part 
(a). So long as there are two or more vectors in the spanning set, we can repeat this 
process until the spanning set is linearly independent and hence is a basis for H. If 
the spanning set is eventually reduced to one vector, that vector will be nonzero (and 
hence linearly independent) because H ^ {0}. ■ 


Bases for Nul^4 and Col ^4 

We already know how to find vectors that span the null space of a matrix A. The 
discussion in Section 4.2 pointed out that our method always produces a linearly 
independent set when Nul A contains nonzero vectors. So, in this case, that method 
produces a basis for Nul A. 

The next two examples describe a simple algorithm for finding a basis for the 
column space. 


EXAMPLE 8 Find a basis for Col B, where 


5 = [ bi b2 


bs] 


0 2 

1 -1 
0 0 

0 0 


0 

0 

1 

0 


SOLUTION Each nonpivot column of 5 is a linear combination of the pivot columns. 
In fact, \)2 = 4bi and b 4 = 2b\ — by By the Spanning Set Theorem, we may discard 
\)2 and b 4 , and {bi, b 3 , bs} will still span Col B. Let 


S 


{bi,b 3 ,b 5 } 


. 

1 


0 


0 

' 


0 


1 


0 



0 

J 

0 


1 



0 


0 


0 



Since bi ^ 0 and no vector in ^ is a linear combination of the vectors that precede it, 
S is linearly independent (Theorem 4). Thus is a basis for Col B. ■ 

What about a matrix A that is not in reduced echelon form? Recall that any 
linear dependence relationship among the columns of A can be expressed in the form 
Ax = 0, where x is a column of weights. (If some columns are not involved in a 
particular dependence relation, then their weights are zero.) When A is row reduced 
to a matrix B, the columns of B are often totally different from the columns of A. 
However, the equations Ax = 0 and Bx = 0 have exactly the same set of solutions. If 
A = [ai ... a /2 ] and B = [bi ... b n ], then the vector equations 

%iai H - h x n 2 i n = 0 and x\b\ H - h x n b n = 0 

also have the same set of solutions. That is, the columns of A have exactly the same 
linear dependence relationships as the columns of B. 


EXAMPLE 9 It can be shown that the matrix 


A = 




a 2 


1 4 0 2 -1 

3 12 1 5 5 

2 8 13 2 

5 20 2 8 8 


is row equivalent to the matrix B in Example 8 . Find a basis for Col A. 
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THEOREM 6 


SOLUTION In Example 8 we saw that 

b 2 = 4bi and b 4 = 2bi — b 3 

so we can expect that 

a 2 = 4a i and a 4 = 2ai — a 3 

Check that this is indeed the case! Thus we may discard a 2 and a 4 when selecting 
a minimal spanning set for Col A. In fact, {ai, a 3 , as} must be linearly independent 
because any linear dependence relationship among ai, a 3 , as would imply a linear 
dependence relationship among bi, b 〗， bs ，But we know that {bi, b 〗 ,bs} is a linearly 
independent set. Thus {ai, a 〗， as} is a basis for Col A. The columns we have used for 
this basis are the pivot columns of A. ■ 

Examples 8 and 9 illustrate the following useful fact. 

The pivot columns of a matrix A form a basis for Col 儿 


PROOF The general proof uses the arguments discussed above. Let B be the reduced 
echelon form of A. The set of pivot columns of B is linearly independent, for no 
vector in the set is a linear combination of the vectors that precede it. Since A is row 
equivalent to B , the pivot columns of A are linearly independent as well, because any 
linear dependence relation among the columns of A corresponds to a linear dependence 
relation among the columns of B. For this same reason, every nonpivot column of A is 
a linear combination of the pivot columns of A. Thus the nonpivot columns of A may be 
discarded from the spanning set for Col A, by the Spanning Set Theorem. This leaves 
the pivot columns of ^4 as a basis for Col A. ■ 

Warning: The pivot columns of a matrix A are evident when A has been reduced only 
to echelon form. But, be careful to use the pivot columns of A itself for the basis of 
Col A. Row operations can change the column space of a matrix. The columns of an 
echelon form B of A are often not in the column space of A. For instance, the columns 
of matrix B in Example 8 all have zeros in their last entries, so they cannot span the 
column space of matrix A in Example 9. 

Two Views of a Basis 

When the Spanning Set Theorem is used, the deletion of vectors from a spanning set 
must stop when the set becomes linearly independent. If an additional vector is deleted, 
it will not be a linear combination of the remaining vectors, and hence the smaller set 
will no longer span V. Thus a basis is a spanning set that is as small as possible. 

A basis is also a linearly independent set that is as large as possible. If 5 is a basis 
for V, and if S is enlarged by one vector—say, w— from V, then the new set cannot be 
linearly independent, because S spans V, and w is therefore a linear combination of the 
elements in S. 

EXAMPLE 10 The following three sets in R 3 show how a linearly independent set 
can be enlarged to a basis and how further enlargement destroys the linear independence 
of the set. Also, a spanning set can be shrunk to a basis, but further shrinking destroys 
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the spanning property. 

' 2 ' 


0 


Linearly independent 
but does not span M 3 


A basis 
for R 3 


Spans R 3 but is 
linearly dependent 


■ 


PRACTICE PROBLEMS 



1" 


~- 2 ~ 

1 . Let vi = 

-2 

3 

and \2 = 

7 

-9 


.Determine if {vi,V2} is a basis for R 3 . Is 


{vi, V2} a basis for R 2 ? 



1 


6 


2 


-4 

2 . Let vi = 

-3 

, = 

2 

， v 3 = 

-2 

,and V4 = 

-8 


4 


-1 


3 


9 


.Find a basis for 


the subspace W spanned by {vi, \2, V3, V4}. 

'0 

,v 2 


3 . Let vi 


0 


,and H 


is a linear combination of Vi and \2 because 


5* in R>. Then every vector in H 
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Is {vi, V2} a basis for HI 


4.3 EXERCISES 


SG 


s 


1 


0 

s 

=s 

0 


1 

0 


0 


0 


Determine whether the sets in Exercises 1-8 are bases for R 3 . 
Of the sets that are not bases, determine which ones are linearly 


Find bases for the null spaces of the matrices given in Exercises 9 
and 10. Refer to the remarks that follow Example 3 in Section 4.2. 


1 


1 


1 

0 

, 

1 

, 

1 

0 


0 


1 

1 


3 



2 . 


Justify your answers. 


' 1 

0 

-2 

-2" 


'1 

1 

-2 

1 

5" 

1 


0 


0 


9 . 

0 

1 

1 

4 

10 . 

0 

1 

0 

-1 

-2 

1 

, 

0 

, 

1 



3 

-1 

-7 

3 


0 

0 

-8 

0 

16 

0 


0 


1 














3 . 


4 . 



3 


-3 


0 


0 


1 


-4 

5 . 

-3 

, 

7 

, 

0 

, 

-3 

6 . 

2 

, 

3 


0 


0 


0 


5 


-4 


6 


-2 


6 


1 

0 


2 



7 . 


8 . 


11 . Find a basis for the set of vectors in R 3 in the plane 
x — 3y + 2z = 0. [Hint: Think of the equation as a “sys- 
tem” of homogeneous equations.] 

12 . Find a basis for the set of vectors in R 2 on the line y = —3x. 

In Exercises 13 and 14, assume that A is row equivalent to B. 
Find bases for Nul A and Col A. 


13 . A 


-2 4 -2 -4 
2-6-3 1 

-3 8 2 -3 


B 


10 6 5 
0 2 5 3 
0 0 0 0 


f 

1 


2 


4 


7 

0 

3 

5 

8 

0 

0 

6 

9 


1 


2 


4 

0 

3 

5 

0 

0 

6 
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14. A 


B 


2 3-48 

2 0 2 8 
4 -3 10 9 

6 0 6 9 

2 0 2 5 

0 3-63 

0 0 0 -7 

0 0 0 0 


In Exercises 15-18, find a basis for the space spanned by the given 
vectors, Vi,... ,Ys. 


15. 


16. 




2 


4 


-2 


8 


-8 



0 


0 


-4 


4 


4 

17. 

[M] 

-4 

, 

2 

, 

0 

, 

8 

, 

0 



—6 


-4 


1 


-3 


0 



0 


4 


-7 


15 


1 



-3 


3 


0 


6 


-6 



2 


0 


2 


-2 


3 

18. 

[M] 

6 

, 

-9 

, 

-4 

, 

-14 

， 

0 



0 


0 


0 


0 


-1 



-7 


6 


一 1 


13 


0 



4 


1 


7 


19. Let Vi = 

-3 

,v 2 = 

9 

,V 3 = 

11 

,and also let 


7 


-2 


6 



H = Span{vi, V 2 , V 3 }. It can be verified that 4vi + 5 y 2 — 
3v 3 = 0. Use this information to find a basis for H. There is 
more than one answer. 


1 



0 


2 


2 


3 

0 


1 


-2 


-1 


-1 

-2 

5 

2 


-8 


10 

7 

—6 

3 



3 


0 


3 


9 

1 


-2 


3 


5 


2 

0 


0 


-1 


-3 


-1 

0 


0 


1 


3 


1 

1 


2 


-1 


-4 


0 


20. Let Vi 


3' 


"4" 


' 2 " 

4 


3 


5 

-2 

,V 2 = 

2 

,and V 3 = 

—6 

-5 


4 


—14 一 


• It can be 


verified that 2vi — y 2 — y 3 = 0. Use this information to find 
a basis for H = Span {vi, V2, V3}. 


In Exercises 21 and 22, mark each statement True or False. Justify 
each answer. 


21. a. A single vector by itself is linearly dependent. 

b. If H = Span {bi, ... ， 〜}, then {bi, … , b p } is a basis for 
H. 

c. The columns of an invertible n y.n matrix form a basis 
for R n . 

d. A basis is a spanning set that is as large as possible. 

e. In some cases, the linear dependence relations among the 
columns of a matrix can be affected by certain elementary 
row operations on the matrix. 


22. a. A linearly independent set in a subspace // is a basis for 

H. 

b. If a finite set S of nonzero vectors spans a vector space 
V, then some subset of 5 is a basis for V. 

c. A basis is a linearly independent set that is as large as 
possible. 

d. The standard method for producing a spanning set for 
Nul A, described in Section 4.2, sometimes fails to pro¬ 
duce a basis for Nul A. 

e. If B is an echelon form of a matrix A, then the pivot 
columns of B form a basis for Col A. 

23. Suppose R 4 = Span {vi, ... ,y 4 }. Explain why {y 1? . ..,y 4 } 
is a basis for R 4 . 


24. Let jB = {vi,... ,v n } be a linearly independent set in R”. 
Explain why B must be a basis for 



1 


0 


0 


25. Let Vi = 

0 

,V2 = 

1 

,V3 = 

1 

,and let H be the 


1 


1 


0 



set of vectors in M 3 whose second and third entries are equal. 
Then every vector in H has a unique expansion as a linear 
combination of Vi, V2, V 3 , because 


s 


_r 


"0" 


"0" 

t 

t 

=s 

0 

1 

+ (t-s) 

1 

1 

+ 5 

1 

0 


for any s and t. Is {vi, V 2 , V 3 } a basis for HI Why or why 
not? 

26. In the vector space of all real-valued functions, find a basis 
for the subspace spanned by {sin t, sin 2t, sin f cos t}. 

27. Let V be the vector space of functions that describe the 
vibration of a mass-spring system. (Refer to Exercise 19 in 
Section 4.1.) Find a basis for V. 

28. (RLC circuit) The circuit in the figure consists of a resistor 
(R ohms), an inductor (L henrys), a capacitor (C farads), and 
an initial voltage source. Let b = R/(2L), and suppose R, 
L, and C have been selected so that b also equals 1 /a/LC". 
(This is done, for instance, when the circuit is used in a 
voltmeter.) Let v(t) be the voltage (in volts) at time t, 
measured across the capacitor. It can be shown that v is 
in the null space H of the linear transformation that maps 
v(t) into Lv f, (t) + Rv f (t) + {\/C)v{t), and H consists of 
all functions of the form v{t) = e~ bt (c\ + C 2 t). Find a basis 
for H. 


R 

Voltage & l c 

source vp 

L 

-- 

Exercises 29 and 30 show that every basis for R n must contain 
exactly n vectors. 
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29. Let S = {vi,... ,y^} be asetof ^ vectors in R n , with k < n. 
Use a theorem from Section 1.4 to explain why S cannot be 
a basis for W 1 . 

30. Let S = {yi,..., y ^； } be a set of A: vectors in R n , with k > n. 
Use a theorem from Chapter 1 to explain why S cannot be a 
basis for 

Exercises 31 and 32 reveal an important connection between lin¬ 
ear independence and linear transformations and provide practice 
using the definition of linear dependence. Let V and W be 
vector spaces, let T : V ^ W be a linear transformation, and let 
{vi,, y^} be a subset of V. 

31. Show that if {vi,..., y p } is linearly dependent in V, then 
the set of images, {r(vi),..., T^)}, is linearly depen¬ 
dent in W. This fact shows that if a linear transforma¬ 
tion maps a set {vi,..., y^} onto a linearly independent set 

… ， T(\ p )}, then the original set is linearly indepen¬ 
dent, too (because it cannot be linearly dependent). 

32. Suppose that r is a one-to-one transformation, so that an 
equation T (u) = T (y) always implies u = y. Show that if 
the set of images {7"(vi),... ， r(v p )} is linearly dependent, 
then {vi,...,y p } is linearly dependent. This fact shows that 
a one-to-one linear transformation maps a linearly indepen¬ 
dent set onto a linearly independent set (because in this case 
the set of images cannot be linearly dependent). 

33. Consider the polynomials pJO = \ -\-1 2 and p 2 (0 = 1 — 
t 2 . Is {Pi, p 2 } a linearly independent set in P 3 ? Why or why 
not? 

34. Consider the polynomials = 1 + /, p 2 (0 = l — t, and 
p 3 (/) = 2 (for all t). By inspection, write a linear depen¬ 


dence relation among p t , p 2 , and p 3 . Then find a basis for 
Span{p 1 ,p 2 ,p 3 }. 

35 . Let F be a vector space that contains a linearly indepen¬ 
dent set {ui, U2,113,114}. Describe how to construct a set of 
vectors {vi, V 2 , V 3 , ¥4} in V such that {vi,V 3 } is a basis for 
Span {yi,y 2 ,V3,v 4 }. 

36 . [M] Let H = Span{ui, u 2 ,u 3 } and K = Span{vi, v 2 ,V3}, 
where 



_ 1' 


0" 


3 " 


2 


2 


4 

Ul = 

0 

,u 2 = 

-1 

,u 3 = 

1 


-1 _ 


1 _ 


_- 4 _ 


'-2" 


2" 


"-1 " 


-2 


3 


4 

Vl = 

-1 

,V2 = 

2 

,V 3 = 

6 


3 


—6 


-2 


Find bases for H, K, and H K. (See Exercises 33 and 34 
in Section 4 . 1 .) 

37 . [M] Show that {t, sin f, cos 2 t, sin f cos t} is a linearly inde¬ 
pendent set of functions defined on IR. Start by assuming that 

C\ . f + C2 . sinf + C3 • cos 2 ? + C4 - sin? cos t = 0 ( 5 ) 

Equation ( 5 ) must hold for all real t, so choose several 
specific values of t (say, ? = 0, .1, .2) until you get a system 
of enough equations to determine that all the Cj must be zero. 

38 . [M] Show that { 1 , cos t, cos 2 cos 6 1 } is a linearly inde¬ 
pendent set of functions defined on R. Use the method of 
Exercise 37 . (This result will be needed in Exercise 34 in 
Section 4 . 5 .) 


WEB 


SOLUTIONS TO PRACTICE PROBLEMS 


1 . Let ^4 = [vi \2 ]. Row operations show that 


A = 

1 

-2 

-2" 

7 

〜 

"1 

0 

-2" 

3 


3 

-9 


0 

0 


Not every row of A contains a pivot position. So the columns of A do not span R 3 , 
by Theorem 4 in Section 1 . 4 . Hence {vi,V2} is not a basis for R 3 . Since Vi and 
\2 are not in R 2 , they cannot possibly be a basis for R 2 . However, since Vi and \2 
are obviously linearly independent, they are a basis for a subspace of R 3 , namely, 
Span{vi,y 2 }. 

2 . Set up a matrix A whose column space is the space spanned by {vi, V2, V3, V4}, and 
then row reduce A to find its pivot columns. 



1 

6 

2 

- 4 " 


"1 

6 

2 

- 4 " 


"1 

6 

2 

- 4 " 

A = 

-3 

2 

-2 

-8 

〜 

0 

20 

4 

-20 

〜 

0 

5 

1 

-5 


4 

-1 

3 

9 


0 

-25 

-5 

25 


0 

0 

0 

0 
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The first two columns of A are the pivot columns and hence form a basis of 
Col A = W. Hence {vi, V2} is a basis for W. Note that the reduced echelon form of 
A is not needed in order to locate the pivot columns. 

3 . Neither Vi nor \2 is in H, so {vi, V2} cannot be a basis for H. In fact, {vi, V2} is a 
basis for the plane of all vectors of the form {c \, C2, 0 ), but H is only a line. 


4.4 COORDINATE SYSTEMS 

An important reason for specifying a basis B for a vector space V is to impose a 
“coordinate system” on V. This section will show that if B contains n vectors, then 
the coordinate system will make V act like W\ If V is already W l itself, then B will 
determine a coordinate system that gives a new “view” of K. 

The existence of coordinate systems rests on the following fundamental result. 


THEOREM 7 The Unique Representation Theorem 

Let B = {bi ， … ， b, z } be a basis for a vector space V. Then for each x in K, there 
exists a unique set of scalars C\,... ,c n such that 

x = cibi + ••• + c n b n (1) 


PROOF Since B spans V, there exist scalars such that ( 1 ) holds. Suppose x also has 
the representation 

x = d\\}\ + ••• + d n h n 

for scalars d\,... ,d n . Then, subtracting, we have 

0 = x — x = (ci — d\)b\ + • • • + (c n — d n )b n (2) 

Since B is linearly independent, the weights in ( 2 ) must all be zero. That is, Cj = dj 
for I < j < n. ■ 

DEFINITION Suppose B = {b\,, b„} is a basis for V and x is in V. The coordinates of 
x relative to the basis B (or the ^-coordinates of x) are the weights ,c n 
such that x = cibi + ••• + c n b n . 


If Ci,..., are the ^-coordinates of x, then the vector in W 1 

~ ci~ 

[ x ]b = : 

_Cn _ 

is the coordinate vector of x (relative to 13 ), or the 石 -coordinate vector of x. The 
mapping x i-^- [x] B is the coordinate mapping (determined by B). 1 


1 The concept of a coordinate mapping assumes that the basis B is an indexed set whose vectors are listed in 
some fixed preassigned order. This property makes the definition of [ x unambiguous. 
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EXAMPLE 1 Consider a basis B = {bi,b2} for R 2 , where bi 

-2 


2 


.Suppose an x in R 2 has the coordinate vector [ x ], 


0 

.Find x. 


and 


SOLUTION The ^-coordinates of x tell how to build x from the vectors in B. That is, 


x = (- 2 )bi + 3 b 2 = (- 2 ) 

EXAMPLE 2 The entries in the vector x = 
the standard basis £ = {ei,e2}, since 


■ 


0 


+ 6 . 


\f 8 = {ei,e2}, then [x] £ = x. 


are the coordinates of x relative to 


1 • Ci -|- 6 • C 2 


■ 


A Graphical Interpretation of Coordinates 

A coordinate system on a set consists of a one-to-one mapping of the points in the set 
into M w . For example, ordinary graph paper provides a coordinate system for the plane 
when one selects perpendicular axes and a unit of measurement on each axis. Figure 1 
shows the standard basis {ei ， e〗}，the vectors bi(= e!) and b2 from Example 1 , and the 

vector x = ^ . The coordinates 1 and 6 give the location of x relative to the standard 

basis: 1 unit in the ei direction and 6 units in the e2 direction. 

Figure 2 shows the vectors bi, b2, and x from Fig. 1 . (Geometrically, the three 
vectors lie on a vertical line in both figures.) However, the standard coordinate grid 
was erased and replaced by a grid especially adapted to the basis B in Example 1 . The 

. r-2i . 

coordinate vector [x]^ = ^ gives the location of x on this new coordinate system: 

—2 units in the bi direction and 3 units in the b2 direction. 



FIGURE 1 Standard graph 
paper. 



FIGURE 2 谷 -graph paper. 


EXAMPLE 3 In crystallography, the description of a crystal lattice is aided by 
choosing a basis {u, y, w} for R 3 that corresponds to three adjacent edges of one “unit 
cell” of the crystal. An entire lattice is constructed by stacking together many copies of 
one cell. There are fourteen basic types of unit cells; three are displayed in Fig. 3. 2 


2 Adapted from The Science and Engineering of Materials, 4th Ed., by Donald R. Askeland (Boston: 

Prindle, Weber & Schmidt, ©2002), p. 36. 



























































218 CHAPTER 4 Vector Spaces 



FIGURE 4 

The ^-coordinate vector of x is 
(3,2). 



.(a) 

Simple 

monoclinic 


(b) 

Body-centered 

cubic 


FIGURE 3 Examples of unit cells. 


0 > 


(C) 

Face-centered 

orthorhombic 


The coordinates of atoms within the crystal are given relative to the basis for the 
lattice. For instance, 

" 1 / 2 ' 

1/2 

identifies the top face-centered atom in the cell in Fig. 3 (c). ■ 


Coordinates in R’ 1 


When a basis B for W 1 is fixed, the ^-coordinate vector of a specified x is easily found, 
as in the next example. 


EXAMPLE 4 


Let bi = 




coordinate vector [xof x relative to B. 


x = 


,and B = {bi, b】}. Find the 


SOLUTION The S-coordinates C\, of x satisfy 



2 

+ C2 

-1 


4 

Cl 

1 

1 

= 

5 


bi 


b 2 

X 


"2 - 

-1" 

~Ci " 


"4" 

1 

1 

_C 2 _ 


_5_ 


( 3 ) 


hi b 2 x 

This equation can be solved by row operations on an augmented matrix or by using 
the inverse of the matrix on the left. In any case, the solution is c\ = 3 , C2 = 2 . Thus 
x = 3bi + 2b 2 , and 



~ Cl ' 


_3_ 

_C 2 _ 


2 


■ 


See Fig. 4 . 


The matrix in ( 3 ) changes the S-coordinates of a vector x into the standard 
coordinates for x. An analogous change of coordinates can be carried out in W l for 
a basis B = {bi,..., b /7 }. Let 


尸 5 = [bi b 2 ••• b„ ] 
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THEOREM 8 


Then the vector equation 


X = Cibi + C2^2 + ••• + c n b„ 


is equivalent to 

X = Pb[^] b ⑷ 


We call Ps the change-of-coordinates matrix from B to the standard basis in R n . 
Left-multiplication by Ps transforms the coordinate vector [x into x. The change-of- 
coordinates equation (4) is important and will be needed at several points in Chapters 5 
and 7. 

Since the columns of Ps form a basis for W 1 , Ps is invertible (by the Invertible 
Matrix Theorem). Left-multiplication by 1 converts x into its ^-coordinate vector: 

P B l x = [x ] 3 

The correspondence xh^[x] s ，produced here by is the coordinate mapping 
mentioned earlier. Since P^ 1 is an invertible matrix, the coordinate mapping is a one- 
to-one linear transformation from onto R w , by the Invertible Matrix Theorem. (See 
also Theorem 12 in Section 1.9.) This property of the coordinate mapping is also true 
in a general vector space that has a basis, as we shall see. 

The Coordinate Mapping 

Choosing a basis B = {bi,..., b„} for a vector space V introduces a coordinate system 
in V. The coordinate mapping x i-> [x]^ connects the possibly unfamiliar space V to 
the familiar space W 1 . See Fig. 5. Points in V can now be identified by their new 
“names.” 



FIGURE 5 The coordinate mapping from V onto R w . 


Let B = {bi,..., b„} be a basis for a vector space V. Then the coordinate 
mapping x i-^ [x is a one-to-one linear transformation from V onto M w . 


PROOF Take two typical vectors in V, say, 

u = cibi H - h c n b n 

w = d\b\ + • • • + d n b n 


Then, using vector operations, 


u + w = {c\ + d\)\i\ + • • • + {c n + d n )h n 
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Isomorphic Vector 
Spaces 4-11 


It follows that 


[u +w] 6 


C\ + d\ 


~ Ci~ 

1 

~ d x ~ 

_ Cn ~\~ d n ^ 


■ Cn _ 

十 

_d n ■ 


[u] e + [w] g 


So the coordinate mapping preserves addition. If r is any scalar, then 
ru = r(cibi + ••■ + C„b„) = (rci)bi + ■•• + (rc n )b„ 



rc\ 


C\ 


叫 e = 


= f 


= /.[u] s 


rc n 


Cn 



Thus the coordinate mapping also preserves scalar multiplication and hence is a linear 
transformation. See Exercises 23 and 24 for verification that the coordinate mapping is 
one-to-one and maps V onto . ■ 


The linearity of the coordinate mapping extends to linear combinations, just as in 
Section 1.8. If Ui, …， u p are in V and if C\,... ,c p are scalars, then 

[cmi + ■■■ + C P U P ] B = Ci[ui ]g + ■■■ + c p [u p ] B (5) 

In words, (5) says that the S-coordinate vector of a linear combination of Ui ， ... ， 〜is 
the same linear combination of their coordinate vectors. 

The coordinate mapping in Theorem 8 is an important example of an isomorphism 
from V onto R”. In general, a one-to-one linear transformation from a vector space V 
onto a vector space W is called an isomorphism from V onto W (iso from the Greek 
for “the same,” and morph from the Greek for “form” or “structure”). The notation and 
terminology for V and W may differ, but the two spaces are indistinguishable as vector 
spaces. Every vector space calculation in V is accurately reproduced in W, and vice 
versa. In particular, any real vector space with a basis of n vectors is indistinguishable 
from R n . See Exercises 25 and 26. 


EXAMPLE 5 Let B be the standard basis of the space F 3 of polynomials; that is, let 
B = { 1 , t, t 2 , t 3 }. A typical element p of P3 has the form 

p(0 = ao + a\t + a 2 t 2 + a^t 3 

Since p is already displayed as a linear combination of the standard basis vectors, we 
conclude that 

a 0 _ 
a\ 
a 2 

Thus the coordinate mapping p i-> [ p is an isomorphism from P3 onto R 4 . All vector 
space operations in P3 correspond to operations in R 4 . ■ 


If we think of F 3 and R 4 as displays on two computer screens that are connected 
via the coordinate mapping, then every vector space operation in P3 on one screen is 
exactly duplicated by a corresponding vector operation in R 4 on the other screen. The 
vectors on the P 3 screen look different from those on the R 4 screen, but they “act” as 
vectors in exactly the same way. See Fig. 6 . 
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« 3 . 


FIGURE 6 The space P 3 is isomorphic to R 4 . 

EXAMPLE 6 Use coordinate vectors to verify that the polynomials 1 + It 2 , 
4 + f + 5/ 2 , and 3 + 2^ are linearly dependent in P 2 . 

SOLUTION The coordinate mapping from Example 5 produces the coordinate vectors 
(1,0,2), (4,1,5), and (3,2,0), respectively. Writing these vectors as the columns of a 
matrix A, we can determine their independence by row reducing the augmented matrix 
for Ax = 0: 


"1 

4 

3 

0 " 


"1 

4 

3 

0 " 

0 

1 

2 

0 

〜 

0 

1 

2 

0 

2 

5 

0 

0 


0 

0 

0 

0 


The columns of A are linearly dependent, so the corresponding polynomials are linearly 
dependent. In fact, it is easy to check that column 3 of ^4 is 2 times column 2 minus 5 
times column 1. The corresponding relation for the polynomials is 

3 + 2^ = 2(4 + r + 5t 2 ) - 5(1 + 2t 2 ) ■ 

The final example concerns a plane in R 3 that is isomorphic to R 2 . 

EXAMPLE 7 Let 



"3" 


"-1" 


3" 

Vl = 

6 

,V 2 = 

0 

,x = 

12 


2 


1 


7 


and B = {\\, V 2 }. Then S is a basis for H = Span {vi, V 2 }. Determine if x is in //, and 
if it is, find the coordinate vector of x relative to B. 

SOLUTION If x is in //, then the following vector equation is consistent: 



3 


-1 


3 

C\ 

6 

+ C2 

0 

= 

12 


2 


1 


7 


The scalars C\ and C 2 , if they exist, are the ^-coordinates of x. Using row operations, 
we obtain 


"3 

-1 

3" 


"1 

0 

2 " 

6 

0 

12 

〜 

0 

1 

3 

2 

1 

7 


0 

0 

0 
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Thus C\ = 2, C 2 = 3, and [x]^ = 
is shown in Fig. 7. 


.The coordinate system on H determined by B 

■ 



If a different basis for H were chosen, would the associated coordinate system also 
make H isomorphic to R 2 ? Surely, this must be true. We shall prove it in the next 
section. 


PRACTICE PROBLEMS 



1 


-3 


3 


-8 

1. Let bi = 

0 

， b 2 = 

4 

, b3 = 

-6 

,and x = 

2 


0 


0 


3 


3 


a. Show that the set B = {bi, b2, b〗} is a basis of R 3 . 

b. Find the change-of-coordinates matrix from B to the standard basis. 

c. Write the equation that relates x in R 3 to [x]^. 

d. Find [x]^, for the x given above. 

2. The set ^ = {1 t,l + t 2 , t + t 2 } is a basis for P 2 . Find the coordinate vector of 
p(0 = 6 3t — t 2 relative to B. 


4.4 EXERCISES 


In Exercises 1-4, find the vector x determined by the given 
coordinate vector [ x ] B and the given basis B. 



2. 13 = 







1 


5 


4 



1 " 

3 . 谷 =] 


-2 

, 

0 

, 

-3 


A X ] B = 

0 



3 


-2 


0 


-2 


f 

-2 


3 


4 

] 

"-3" 

4. B = \ 

2 

, 

0 

, 

-1 

卜 [ x ]h = 

2 

l 

0 


2 


3 

) 

—1 


In Exercises 5-8, find the coordinate vector [ x ] B of x relative to 
the given basis B = {bi,..., b"}. 


5. bi = 

1 

-2 

, b 2 = 

3 

-5 

,x = 

-1 

1 





6. bi = 

1 

-4 

， b 2 = 

2 

-3 

,x = 

-1 

-6 






1 



-3 



2 



8 

7. bi = 

-1 

,b 2 = 


4 

,b 3 = 


-2 

, x 

二 


-9 


-3 



9 



4 



6 


1 


2 



1 


0 


8. bi = 

1 

, b 2 = 

0 

， b 3 = 

-1 

,x = 

0 



3 


8 



3 


-2 
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In Exercises 9 and 10, find the change-of-coordinates matrix from 
B to the standard basis in . 


produce another representation of w as a linear combination 

Of Vi，. . . , V4-] 


9. 8 = 

10 . B = 



Let B = 




Since the coordinate mapping 


determined by ^ is a linear transformation from R 2 into R 2 , 


this mapping must be implemented by some 2x2 matrix A. 
Find it. [Hint: Multiplication by A should transform a vector 
x into its coordinate vector [ x ] B ，] 


In Exercises 11 and 12, use an inverse matrix to find [ x ] B for the 
given x and B. 



13. The set B = {I t 2 ,t t 2 ,l 2t 1 2 } is a basis for P】. 
Find the coordinate vector of p(/) = 1 + 4^ + It 1 relative 
to B. 


22. Let B = {bi,..., b„} be a basis for V. Produce a descrip¬ 
tion of an « x n matrix A that implements the coordinate 
mapping x i-^- [ x ] g . (See Exercise 21.) 

Exercises 23-26 concern a vector space V, a basis B = 
{bi,..., b„}, and the coordinate mapping x \-^ [x ] g . 

23. Show that the coordinate mapping is one-to-one. (Hint: 
Suppose [ u ] g = [ w ] B for some u and w in V, and show 
that u = w.) 


14. The set B = {\ — t 2 ,t — t 2 ,2 — t 1 2 } is a basis for P 2 . 
Find the coordinate vector of p(0 = 1 + 3? — 6t 2 relative 
to B. 


24. 


Show that the coordinate mapping is onto R n . That is, given 
any y in R”，with entries yi,..., y n , produce u in K such that 


[u] B = y. 


In Exercises 15 and 16, mark each statement True or False. Justify 
each answer. Unless stated otherwise, ^ is a basis for a vector 
space V. 


15. a. If x is in F and if B contains n vectors, then the in¬ 
coordinate vector of x is in R w . 

b. If Ps is the change-of-coordinates matrix, then [x],g = 
PbX, for x in V. 

c. The vector spaces P 3 and R 3 are isomorphic. 


16. a. If B is the standard basis for R n , then the ^-coordinate 
vector of an x in R” is x itself. 


b. The correspondence [x] 6 4 x is called the coordinate 
mapping. 


c. In some cases, a plane in M 3 can be isomorphic to R 2 . 


17. The vectors Vi 


1 


2 


-3 

-3 

,V 2 = 

-8 

,v 3 = 

7 


span M . 2 


but do not form a basis. Find two different ways to express 
as a linear combination of Vi, V 2 , V 3 . 


18. Let ^ = {bi ， … ， b„} be a basis for a vector space V. Explain 
why the ^-coordinate vectors of bi,..., b„ are the columns 
ei,..., e„ of the n x n identity matrix. 


19. Let 5 be a finite set in a vector space V with the property 
that every x in F has a unique representation as a linear 
combination of elements of S. Show that 5 is a basis of V. 


20. Suppose {vi ， … ， V 4 } is a linearly dependent spanning set 
for a vector space V. Show that each w in K can be 
expressed in more than one way as a linear combination of 
Vi,.. • ， V 4 . [Hint: Let w = + ... + be an arbitrary 

vector in V. Use the linear dependence of {v! ， … ， V 4 } to 


25. Show that a subset {ui ， ... ， u p } in V is linearly in¬ 

dependent if and only if the set of coordinate vectors 
{[ u i • • •, [ ] B } is linearly independent in Hint: 

Since the coordinate mapping is one-to-one, the following 
equations have the same solutions, ... ,c p . 

C1U1 + … + Cp\x p = 0 The zero vector in V 
[C1U1 + ••• + c p u p ] 6 = [ 0 ]^ The zero vector in R n 

26. Given vectors Ui ， … ， u", and w in V, show that w is a linear 
combination of Ui, if and only if [ w is a linear 
combination of the coordinate vectors [ Ui ..., [ 

In Exercises 27-30, use coordinate vectors to test the linear 
independence of the sets of polynomials. Explain your work. 

27. 1 + 2t\ 2 + 卜 3^ 2 , -t + It 1 - t 3 

28. \-2t 2 -t\ t + 2t\ \-\-t-2t 2 

29. (1 - 0 2 , + (1 -0 3 

30. (2 - t)\ (3 - t)\ 1 + 6r - 5/ 2 + t 3 

31. Use coordinate vectors to test whether the following sets of 
polynomials span IP 2 . Justify your conclusions. 

a. 1 _ 3? + 5r 2 , —3 + 5? — It 2 , —4 + 5, _ 6t 2 , 1 - t 2 

b. 1 — 8? — 2f2 ， 一 3 + + 2p, 2 — 3t 

32. Let = 1 + r 2 , p 2 (0 = t — 3t 2 , p 3 (r) = 1 + r — 3t 2 . 

a. Use coordinate vectors to show that these polynomials 
form a basis for IP 2 . 

b. Consider the basis B = {p 1? p 2 , p 3 } for P 2 . Find q in P 2 , 

_-ll 

given that [q] 6 = 1 

2 
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In Exercises 33 and 34, determine whether the sets of polynomials 
form a basis for P 3 . Justify your conclusions. 

33. [M] 3 + 7，, 5 + r - 2t\t- 2t 2 , 1 + 16r - 6t 2 + 2t 3 

34. [M] 5 - 3r + 4/ 2 + 2t\ 9 + r + 8/ 2 - 6t 3 ,6-2t 5t 2 ,t 3 

35. [M] Let H = Span {vi,V 2 } and ^ = {vi, V 2 }. Show that x is 
in H and find the ^-coordinate vector of x, for 



" 11 " 


"14" 


19" 


-5 


-8 


-13 

Vi = 

10 

,V 2 = 

13 

,x = 

18 


7 


10 


15 


36. [M] Let H = Span{vi,y 2 , V 3 } and B = {vi,V 2 ,v 3 }. Show 
that is a basis for H and xisin //, and find the ^-coordinate 
vector of x, for 



-6 


8 


-9 


4 


4 


-3 


5 


7 

Vi = 

-9 

， v 2 = 

7 

， V3 = 

-8 

,x = 

-8 


4 


-3 


3 


3 


[M] Exercises 37 and 38 concern the crystal lattice for titanium, 
which has the hexagonal structure shown on the left in the ac¬ 


companying figure. The vectors 


2.6 


0 


0 

-1.5 

, 

3 

, 

0 

0 


0 


4.8 


in R 3 


form a basis for the unit cell shown on the right. The numbers 
here are Angstrom units (1 A = 10 —8 cm). In alloys of titanium, 


some additional atoms may be in the unit cell at the octahedral 
and tetrahedral sites (so named because of the geometric objects 
formed by atoms at these locations). 



The hexagonal close-packed lattice and its unit cell. 


37. One of the octahedral sites is 


1 / 2 . 

1/4 

1/6 


,relative to the lattice 


basis. Determine the coordinates of this site relative to the 
standard basis of R 3 . 


38. One of the tetrahedral sites is 


1 / 2 ' 

1/2 

1/3 


.Determine the coor¬ 


dinates of this site relative to the standard basis of R 3 . 


SOLUTIONS TO PRACTICE PROBLEMS 


1. a. It is evident that the matrix = [b\ \)2 b 3 ] is row-equivalent to the identity 
matrix. By the Invertible Matrix Theorem, Pq is invertible and its columns form 
a basis for R 3 . 


1 -3 3 

b. From part (a), the change-of-coordinates matrix is Ps = 0 4—6 

0 0 3 


c. x = P b [x] b 

d. To solve the equation in (c), it is probably easier to row reduce an augmented 
matrix than to compute P^" 1 : 


'1 

-3 

3 

-8" 


"1 

0 

0 

-5" 

0 

4 

-6 

2 

〜 

0 

1 

0 

2 

0 

0 

3 

3 


0 

0 

1 

1 


Pb 


X 


I 


[ X L 


[xb ] = 2 


2 . The coordinates of p(f) = 6 3t — t 2 with respect to B satisfy 


C \(1 + 0 + (2(1 + f 2 ) + c^(t + f 2 ) = 6 3t — 
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Equating coefficients of like powers of t, we have 

C\ + C2 =6 

c\ + C 3 = 3 

^ 2+^3 = -1 

Solving, we find that C\ = 5, C 2 = 1, C 3 = —2, and [ p ]^ = 


4.5 THE DIMENSION OF A VECTOR SPACE 

Theorem 8 in Section 4.4 implies that a vector space V with a basis B containing n 
vectors is isomorphic to W 1 . This section shows that this number n is an intrinsic 
property (called the dimension) of the space V that does not depend on the particular 
choice of basis. The discussion of dimension will give additional insight into properties 
of bases. 

The first theorem generalizes a well-known result about the vector space W 1 . 



THEOREM 9 If a vector space V has a basis B = {bi, …， b „}， then any set in V containing 
more than n vectors must be linearly dependent. 


PROOF Let {ui, … ， Up} be a set in K with more than n vectors. The coordinate vectors 
[ui ] 0 ,..., [u^ form a linearly dependent set in W\ because there are more vectors 
(p) than entries in) in each vector. So there exist scalars Ci,..., c p , not all zero, such 
that 


0 

c\[^\] B + -' + c p [n p ] B = : 

0 


The zero vector in 


Since the coordinate mapping is a linear transformation, 


0 

[C1U1 + • • • + CpUp = : 

0 

The zero vector on the 

C 1 U 1 H - h c p Up from 

0 • bi + ••• + 0 • b„ =0. 
dependent . 1 


right displays the n weights needed to build the vector 
the basis vectors in B. That is, ciUi + • • • + c p u p = 
Since the c, are not all zero, {ui,...,u p } is linearly 

■ 


Theorem 9 implies that if a vector space V has a basis B = {bi,..., b„}, then each 
linearly independent set in V has no more than n vectors. 


1 Theorem 9 also applies to infinite sets in V. An infinite set is said to be linearly dependent if some finite 
subset is linearly dependent; otherwise, the set is linearly independent. If S is an infinite set in V, take any 
subset {ui，.... Up} of S, with p > n. The proof above shows that this subset is linearly dependent, and 
hence so is S. 
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THEOREM 10 


DEFINITION 



If a vector space V has a basis of n vectors, then every basis of V must consist of 
exactly n vectors. 


PROOF Let B\ be a basis of n vectors and B 2 be any other basis (of V). Since B\ is 
a basis and B 2 is linearly independent, B 2 has no more than n vectors, by Theorem 9. 
Also, since B 2 is a basis and B\ is linearly independent, B 2 has at least n vectors. Thus 
B 2 consists of exactly n vectors. ■ 

If a nonzero vector space V is spanned by a finite set S, then a subset of 5 is a 
basis for V, by the Spanning Set Theorem. In this case, Theorem 10 ensures that the 
following definition makes sense. 


If V is spanned by a finite set, then V is said to be finite -dimensional , and the 
dimension of V, written as dim V, is the number of vectors in a basis for V. The 
dimension of the zero vector space {0} is defined to be zero. If V is not spanned 
by a finite set, then V is said to be infinite-dimensional. 


EXAMPLE 1 The standard basis for W 1 contains n vectors, so dimM 72 = n. The 
standard polynomial basis {l, t,t 2 } shows that dim P 2 = 3. In general, dim F„ = n l. 
The space P of all polynomials is infinite-dimensional (Exercise 27). ■ 


EXAMPLE 2 Let H = Span{vi, V 2 }, where Vi = 6 and \2 = 0 . Then 

2j [_ 1 _ 

H is the plane studied in Example 7 in Section 4.4. A basis for H is {vi, V 2 }, since Vi 
and V 2 are not multiples and hence are linearly independent. Thus dim H = 2. ■ 


EXAMPLE 3 Find the dimension of the subspace 


H = 


( 

a — 3b 6c 


1 

5a + Ad 
b — 2c — d 

: a, b, c, d in R 

[ 

5d 



SOLUTION It is easy to see that H is the set of all linear combinations of the vectors 



_r 


'-3' 


6" 


0" 

Vl = 

5 

0 

,v 2 = 

0 

1 

,v 3 = 

0 

-2 

,v 4 = 

4 

-1 


0 


0 


0 


5 


Clearly, Vi 7 ^ 0, V 2 is not a multiple of Vi, but V 3 is a multiple of V 2 . By the Spanning 
Set Theorem, we may discard V 3 and still have a set that spans H. Finally, V 4 is not a 
linear combination of Vi and V2. So {vi, V2, V4} is linearly independent (by Theorem 4 
in Section 4.3) and hence is a basis for H. Thus dim H = 3. ■ 


EXAMPLE 4 The subspaces of R 3 can be classified by dimension. See Fig. 1. 
0-dimensional subspaces. Only the zero subspace. 

1-dimensional subspaces. Any subspace spanned by a single nonzero vector. Such 
subspaces are lines through the origin. 
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2- dimensional subspaces. Any subspace spanned by two linearly independent 
vectors. Such subspaces are planes through the origin. 

3- dimensional subspaces. Only R 3 itself. Any three linearly independent vectors 

in R 3 span all of R 3 , by the Invertible Matrix Theorem. ■ 


^3 



(a) (b) 

FIGURE 1 Sample subspaces of R 3 . 


Subspaces of a Finite-Dimensional Space 

The next theorem is a natural counterpart to the Spanning Set Theorem. 


THEOREM 11 Let // be a subspace of a finite-dimensional vector space V. Any linearly 
independent set in H can be expanded, if necessary, to a basis for H. Also, H is 
finite-dimensional and 

dim H < dim V 


PROOF If// = {0}, then certainly dim H = 0 < dim V. Otherwise, let S = {ui ，…， 
u^} be any linearly independent set in H . If S spans H ， then S is a basis for H • 
Otherwise, there is some ua：+i in H that is not in Span S. But then {ui,..., uyt, ua ： +i} 
will be linearly independent, because no vector in the set can be a linear combination of 
vectors that precede it (by Theorem 4). 

So long as the new set does not span H ， we can continue this process of expanding 
5 to a larger linearly independent set in H. But the number of vectors in a linearly 
independent expansion of S can never exceed the dimension of V, by Theorem 9. 
So eventually the expansion of S will span H and hence will be a basis for H ， and 
dimH < dim V. ■ 


When the dimension of a vector space or subspace is known, the search for a basis 
is simplified by the next theorem. It says that if a set has the right number of elements, 
then one has only to show either that the set is linearly independent or that it spans the 
space. The theorem is of critical importance in numerous applied problems (involving 
differential equations or difference equations, for example) where linear independence 
is much easier to verify than spanning. 


THEOREM 12 The Basis Theorem 

Let K be a /7-dimensional vector space, p > l. Any linearly independent set of 
exactly p elements in V is automatically a basis for V. Any set of exactly p 
elements that spans V is automatically a basis for V. 
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PROOF By Theorem 11 ， a linearly independent set S of p elements can be extended to 
a basis for V. But that basis must contain exactly p elements, since dim V = p. So S 
must already be a basis for V. Now suppose that S has p elements and spans V. Since 
V is nonzero, the Spanning Set Theorem implies that a subset S f of S is a basis of V. 
Since dim V = p ， S f must contain p vectors. Hence S = S’. ■ 

The Dimensions of Nul^l and Col Z 

Since the pivot columns of a matrix A form a basis for Col ^4, we know the dimension 
of Col A as soon as we know the pivot columns. The dimension of Nul A might seem to 
require more work, since finding a basis for Nul A usually takes more time than a basis 
for Col A. But there is a shortcut! 

Let ^4 be an m x « matrix, and suppose the equation Ax = 0 has k free variables. 
From Section 4.2, we know that the standard method of finding a spanning set for Nul A 
will produce exactly k linearly independent vectors—say, ui,...,u^；—one for each 
free variable. So {ui,... ,u^} is a basis for Nul A, and the number of free variables 
determines the size of the basis. Let us summarize these facts for future reference. 


The dimension of Nul A is the number of free variables in the equation Ax = 0, 
and the dimension of Col A is the number of pivot columns in A. 


EXAMPLE 5 Find the dimensions of the null space and the column space of 


A 


6-1 
-2 2 
-4 5 


1 -7 
3-1 
8-4 


SOLUTION Row reduce the augmented matrix [ A 0 ] to echelon form: 


1 -2 2 3 -1 0 

0012-20 
0 0 0 0 0 0 


There are three free variables—^, X 4 , and X 5 . Hence the dimension of Nul A is 3. Also, 
dim Col A = 2 because A has two pivot columns. ■ 


PRACTICE PROBLEMS 

Decide whether each statement is True or False, and give a reason for each answer. Here 
F is a nonzero finite-dimensional vector space. 

1. If dim V = p and if is a linearly dependent subset of V, then S contains more 
than p vectors. 

2. If S spans V and if T is a subset of V that contains more vectors than S, then T is 
linearly dependent. 
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1 

-1 

0 


1 

1 

-1 

17. A = 

0 

1 

3 

18. A = 

0 

2 

0 


_0 

0 

1 


0 

0 

0 


In Exercises 19 and 20, F is a vector space. Mark each statement 
True or False. Justify each answer. 

19. a. The number of pivot columns of a matrix equals the 

dimension of its column space. 

b. A plane in R 3 is a two-dimensional subspace of R 3 . 

c. The dimension of the vector space P 4 is 4. 

d. If dim V = n and 5" is a linearly independent set in V, 
then 5 is a basis for V. 

e. If a set {vi,..., y^} spans a finite-dimensional vector 
space V and if T is a set of more than p vectors in V, 
then T is linearly dependent. 

20. a. R 2 is a two-dimensional subspace of R 3 . 

b. The number of variables in the equation Ax = 0 equals 
the dimension of Nul A. 

c. A vector space is infinite-dimensional if it is spanned by 
an infinite set. 

d. If dim V = n and if S spans V, then 5 is a basis of V. 

e. The only three-dimensional subspace of R 3 is R 3 itself. 

21. The first four Hermite polynomials are 1, It, —2 + 4/ 2 , and 
— \2t + St 3 . These polynomials arise naturally in the study 
of certain important differential equations in mathematical 
physics . 2 Show that the first four Hermite polynomials form 
a basis of P 3 . 

22. The first four Laguerre polynomials are 1, 1 — 2 — + t 2 , 

and 6 — 18 / + 9t 2 — t 3 . Show that these polynomials form a 
basis of P 3 . 

23. Let B be the basis of P 3 consisting of the Hermite polynomi¬ 
als in Exercise 21, and let p ⑴ =—1 + St 2 + St 3 . Find the 
coordinate vector of p relative to B. 

24. Let B be the basis of P 2 consisting of the first three 
Laguerre polynomials listed in Exercise 22, and let 
p (0 = s 5t — It 2 . Find the coordinate vector of p relative 
to B. 

25. Let 5 be a subset of an «-dimensional vector space V, and 
suppose S contains fewer than n vectors. Explain why S 
cannot span V. 

26. Let H be an «-dimensional subspace of an «-dimensional 
vector space V. Show that H = V. 

27. Explain why the space P of all polynomials is an infinite¬ 
dimensional space. 


2 See Introduction to Functional Analysis, 2d ed., by A. E. Taylor and 
David C. Lay (New York: John Wiley & Sons, 1980), pp. 92-93. Other 
sets of polynomials are discussed there, too. 


Determine 
shown in E 


13. A 


14. A 


For each subspace in Exercises 1-8, (a) find a basis for the 
subspace, and (b) state the dimension. 


s — 2t 
s 1 
3t 


: 5 , ? in RI 


2 . 


2a 

-Ab 


: a, Z? in R > 




_ 2c _ 




~P + ^~ 


3. 


a — b 
b — 3c 

: a, 办 ， c in R 

4. 


—P 

: p, q inR 



a 2b 




_ p + q _ 




r n _ % 


、 




5. 


6 . 


2p + 5r 
—2q + 2r 
—3p + 6r 


: p, r in R 


3a — c 
—b — 3c 
-la + 6Z? + 5c 
— 3ci c 


: a, 办 ， c in R 


7. {(a, b,c) : a — 3b c = 0,b — 2c = 0,2b — c = 0} 

8. {(a, b,c,d) : a — 3b c = 0} 

9. Find the dimension of the subspace of all vectors in R 3 whose 
first and third entries are equal. 

10. Find the dimension of the subspace H of R 2 spanned by 


In Exercises 11 and 12, find the dimension of the subspace 
spanned by the given vectors. 


15. A 



1 

2 

3 

0 

0 


3 2" 

-6 5_ 

A = 

0 

0 

1 

0 

1 

16. A = 


0 

0 

0 

1 

0 





1 


-2 


-3 

-5 


10 


15 


4.5 EXERCISES 
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2$. Show that the space C (M.) of all continuous functions defined 
on the real line is an infinite-dimensional space. 

In Exercises 29 and 30, K is a nonzero finite-dimensional vector 
space, and the vectors listed belong to V. Mark each statement 
True or False. Justify each answer. (These questions are more 
difficult than those in Exercises 19 and 20.) 

29. a. If there exists a set {vi ， ... ， v p } that spans V, then 

dim V < p. 

b. If there exists a linearly independent set {vi,.. • ， in 
V, then dim V > p. 

c. If dim V = p, then there exists a spanning set of /> + 1 
vectors in V. 

30. a. If there exists a linearly dependent set {vi,..., y^} in V, 

then dim V <P- 

b. If every set of p elements in V fails to span V, then 
dim V > p. 

c. If p >2 and dim V = p, then every set of p — \ nonzero 
vectors is linearly independent. 


Exercises 31 and 32 concern finite-dimensional vector spaces V 
and W and a linear transformation T \ V ^ W. 


31. Let // be a nonzero subspace of V, and let T(H) be the set of 
images of vectors in H. Then T(H) is a subspace of VK, by 
Exercise 35 in Section 4.2. Prove that dim 7(//) < dim H. 

32. Let // be a nonzero subspace of V, and suppose T is 
a one-to-one (linear) mapping of V into W. Prove that 
dimT(H) = dim H. If T happens to be a one-to-one map¬ 
ping of V onto W, then dim V = dim W. Isomorphic finite¬ 
dimensional vector spaces have the same dimension. 


33. [M] According to Theorem 11, a linearly independent set 
{vi,...,y^；} in can be expanded to a basis for R n . One 
way to do this is to create ^4 = [ vi ... v 众 ei ••• e„ ], 
with ei,..., e„ the columns of the identity matrix; the pivot 
columns of A form a basis for R n . 


a. Use the method described to extend the following vectors 
to a basis for M 5 : 


Vi 


"-9" 


9" 


6 " 

-7 


4 


7 

8 

,V 2 = 

1 

,V 3 = 

-8 

-5 


6 


5 

7 


-7_ 


-7 


b. Explain why the method works in general: Why are the 
original vectors Vi,... ， Va ； included in the basis found for 
CoM? Why is Col A = R n l 


34. [M] Let B = {1, cos t, cos 2 1 ,..., cos 6 1 } and C = {1, cos?, 
cos 2/,..., cos 6 t}. Assume the following trigonometric 
identities (see Exercise 37 in Section 4.1). 


cos 2 / = — 1 + 2 cos 2 1 

cos = —3 cos r + 4 cos 3 1 

cos = 1 — 8 cos 2 r + 8 cos 4 1 

cos 5t = 5 cos t — 20 cos 3 1 + 16 cos 5 1 

cos 6 t = —1 + 18 cos 2 1 — 48 cos 4 1 + 32 cos 6 1 


Let H be the subspace of functions spanned by the functions 
in B. Then S is a basis for H ， by Exercise 38 in Section 4.3. 


a. Write the 5-coordinate vectors of the vectors in C, and 
use them to show that C is a linearly independent set in 
H. 

b. Explain why C is a basis for H . 


SOLUTIONS TO PRACTICE PROBLEMS 

1. False. Consider the set {0}. 

2. True. By the Spanning Set Theorem, S contains a basis for V; call that basis S r . 
Then T will contain more vectors than S’. By Theorem 9, T is linearly dependent. 


4.6 RANK 


With the aid of vector space concepts, this section takes a look inside a matrix and 
reveals several interesting and useful relationships hidden in its rows and columns. 

For instance, imagine placing 2000 random numbers into a 40 x 50 matrix A and 
then determining both the maximum number of linearly independent columns in A and 
the maximum number of linearly independent columns in A T (rows in ^4). Remarkably, 
the two numbers are the same. As we’ll soon see, their common value is the rank of the 
matrix. To explain why, we need to examine the subspace spanned by the rows of A. 
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The Row Space 

If ^ is an m x « matrix, each row of A has n entries and thus can be identified with a 
vector in W 1 . The set of all linear combinations of the row vectors is called the row 
space of A and is denoted by Row A. Each row has n entries, so Row A is a. subspace 
of R w . Since the rows of A are identified with the columns of A T , we could also write 
Col A T in place of Row A. 


EXAMPLE 1 Let 


-2 


A = 


3 


-5 8 

3 -5 

11 -19 
7 -13 


0 -17 
1 5 

7 1 

5 -3 


ri = (-2,-5, 8, 0,-17) 
r 2 = (1 ， 3 ， -5,1 ， 5) 
r 3 = (3,11,-19,7,1) 
r 4 = (l,7,-13,5,-3) 


The row space of A is the subspace of R 5 spanned by {ri ， 1 * 2 , 1 * 3 , rj. That is, Row ^4 = 
Span{ri, r 2 , 1 * 3 , rj. It is natural to write row vectors horizontally; however, they may 
also be written as column vectors if that is more convenient. ■ 


If we knew some linear dependence relations among the rows of matrix A in 
Example 1, we could use the Spanning Set Theorem to shrink the spanning set to a 
basis. Unfortunately, row operations on A will not give us that information, because 
row operations change the row-dependence relations. But row reducing A is certainly 
worthwhile, as the next theorem shows! 


THEOREM 13 If two matrices A and B are row equivalent, then their row spaces are the same. 

If B is in echelon form, the nonzero rows of B form a basis for the row space of 
A as well as for that of B. 


PROOF If B is obtained from A by row operations, the rows of B are linear com¬ 
binations of the rows of A. It follows that any linear combination of the rows of B 
is automatically a linear combination of the rows of A. Thus the row space of B is 
contained in the row space of A. Since row operations are reversible, the same argument 
shows that the row space of ^4 is a subset of the row space of B. So the two row spaces 
are the same. If B is in echelon form, its nonzero rows are linearly independent because 
no nonzero row is a linear combination of the nonzero rows below it. (Apply Theorem 
4 to the nonzero rows of B in reverse order, with the first row last.) Thus the nonzero 
rows of B form a basis of the (common) row space of B and A. ■ 

The main result of this section involves the three spaces: Row ^4, Col A, and Nul A. 
The following example prepares the way for this result and shows how one sequence of 
row operations on A leads to bases for all three spaces. 


EXAMPLE 2 

the matrix 


Find bases for the row space, the column space, and the null space of 

"-2 -5 8 0 -17" 

1 3-5 1 

3 11 -19 7 

1 7 -13 5 


A = 


-3 
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SOLUTION To find bases for the row space and the column space, row reduce A to an 
echelon form: 



"1 

3 

-5 

1 

5 

乂 〜 5 = 

0 

1 

-2 

2 

-7 

0 

0 

0 

-4 

20 


0 

0 

0 

0 

0 


By Theorem 13, the first three rows of B form a basis for the row space of A (as well 
as for the row space of B). Thus 

Basis for Row 丄 {(1 ， 3, —5, 1,5), (0,1,-2,2, -7), (0,0,0, -4,20)} 

For the column space, observe from B that the pivots are in columns 1,2, and 4. Hence 
columns 1,2, and A of A (not B) form a basis for Col A: 


Basis for Col A: 



-2 


-5 


0 



1 


3 


1 



3 


11 

> 

7 



1 


7 


5 



Notice that any echelon form of A provides (in its nonzero rows) a basis for Row A 
and also identifies the pivot columns of A for Col A. However, for Nul A, we need the 
reduced echelon form. Further row operations on B yield 


乂〜万 〜 C = 


1 

0 

0 

0 


0 1 

1 -2 
0 0 

0 0 


0 1 
0 3 
1 -5 
0 0 


The equation Ax = 0 is equivalent to Cx = 0, that is, 


+ X3 + ^5 = 0 

X2 — 2X3 + 3%5 = 0 

x\ — 5x5 = 0 


So X\ = —X 3 — X 5 , X 2 = 2xt, — 3^5, ^4 = 5^5, with X 3 and X 5 free variables. The usual 
calculations (discussed in Section 4.2) show that 



Observe that, unlike the basis for Col A, the bases for Row A and Nul A have no simple 
connection with the entries in A itself. 1 ■ 


'It is possible to find a basis for the row space Row A that uses rows of A. First form A T , and then row 
reduce until the pivot columns of A T are found. These pivot columns of A T are rows of A, and they form 
a basis for the row space of A. 
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WEB 


DEFINITION 


THEOREM 14 


Warning: Although the first three rows of B in Example 2 are linearly independent, 
it is wrong to conclude that the first three rows of A are linearly independent. (In fact, 
the third row of A is 2 times the first row plus 7 times the second row.) Row operations 
may change the linear dependence relations among the rows of a matrix. 


The Rank Theorem 

The next theorem describes fundamental relations among the dimensions of Col A, 
Row A, and Nul A. 


The rank of A is the dimension of the column space of A. 


Since Row A is the same as Col A T , the dimension of the row space of A is the rank 
of A T . The dimension of the null space is sometimes called the nullity of A, though we 
will not use this term. 

An alert reader may have already discovered part or all of the next theorem while 
working the exercises in Section 4.5 or reading Example 2 above. 


The Rank Theorem 

The dimensions of the column space and the row space of an m x n matrix A are 
equal. This common dimension, the rank of A, also equals the number of pivot 
positions in A and satisfies the equation 

rank A + dim Nul A = n 


PROOF By Theorem 6 in Section 4.3, rank^4 is the number of pivot columns in A. 
Equivalently, rank ^ is the number of pivot positions in an echelon form B of A. 
Furthermore, since B has a nonzero row for each pivot, and since these rows form a 
basis for the row space of A, the rank of A is also the dimension of the row space. 

From Section 4.5, the dimension of Nul A equals the number of free variables in 
the equation Ax = 0. Expressed another way, the dimension of Nul A is the number of 
columns of A that are not pivot columns. (It is the number of these columns, not the 
columns themselves, that is related to Nul A.) Obviously, 


j number of | 

j number of ) 

(number of 

(pivot columns) 

| nonpivot columns j ~ 

(columns 


This proves the theorem. ■ 

The ideas behind Theorem 14 are visible in the calculations in Example 2. The 
three pivot positions in the echelon form B determine the basic variables and identify 
the basis vectors for Col A and those for Row A. 


EXAMPLE 3 

a. If ^4 is a 7 x 9 matrix with a two-dimensional null space, what is the rank of A1 

b. Could a 6 x 9 matrix have a two-dimensional null space? 





234 CHAPTER 4 Vector Spaces 


SOLUTION 

a. Since A has 9 columns, (rank A) -h 2 = 9, and hence rank A = 1. 

b. No. If a 6 x 9 matrix, call it B, had a two-dimensional null space, it would have to 

have rank 7, by the Rank Theorem. But the columns of B are vectors in R 6 , and so 
the dimension of Col B cannot exceed 6; that is, rank B cannot exceed 6. ■ 

The next example provides a nice way to visualize the subspaces we have been 
studying. In Chapter 6, we will learn that Row A and Nul A have only the zero vector 
in common and are actually “perpendicular” to each other. The same fact will apply 
to Row A T {= Col A) and NuM r . So Fig. 1, which accompanies Example 4, creates 
a good mental image for the general case. (The value of studying A T along with A is 
demonstrated in Exercise 29.) 

'3 0 -1" 

EXAMPLE 4 Let A = 3 0—1 . It is readily checked that Nul A is the X 2 - 

4 0 5 

axis, Row A is the x 1X3-plane, Col A is the plane whose equation is X\ — X 2 = 0, and 
NuM r is the set of all multiples of (1, —1,0). Figure 1 shows Nul A and Row 乂 in 
the domain of the linear transformation x 1-^- Ax; the range of this mapping, Co\A, is 
shown in a separate copy of R 3 , along with Nul A T . ■ 



FIGURE 1 Subspaces determined by a matrix A. 


Applications to Systems of Equations 

The Rank Theorem is a powerful tool for processing information about systems of 
linear equations. The next example simulates the way a real-life problem using linear 
equations might be stated, without explicit mention of linear algebra terms such as 
matrix, subspace, and dimension. 

EXAMPLE 5 A scientist has found two solutions to a homogeneous system of 
40 equations in 42 variables. The two solutions are not multiples, and all other solutions 
can be constructed by adding together appropriate multiples of these two solutions. 
Can the scientist be certain that an associated nonhomogeneous system (with the same 
coefficients) has a solution? 

SOLUTION Yes. Let A be the 40 x 42 coefficient matrix of the system. The given 
information implies that the two solutions are linearly independent and span Nul A. So 
dim Nul 乂 = 2. By the Rank Theorem, dim Col A = 42 — 2 = 40. Since R 40 is the 
only subspace of R 40 whose dimension is 40, Col A must be all of R 40 . This means that 
every nonhomogeneous equation Ax = b has a solution. ■ 
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Rank and the Invertible Matrix Theorem 

The various vector space concepts associated with a matrix provide several more 
statements for the Invertible Matrix Theorem. The new statements listed here follow 
those in the original Invertible Matrix Theorem in Section 2.3. 


THEOREM The Invertible Matrix Theorem (continued) 

Let v4 be an 7i x « matrix. Then the following statements are each equivalent to 
the statement that A is an invertible matrix. 

m. The columns of A form a basis of M /7 . 

n. CoM = R n 

o. dim Col A = n 

p. rank^4 = n 

q. Nul A = {0} 

r. dim Nul ^4 = 0 


PROOF Statement (m) is logically equivalent to statements (e) and (h) regarding linear 
independence and spanning. The other five statements are linked to the earlier ones of 
the theorem by the following chain of almost trivial implications: 

(g) =» ⑻ 4 (o) =» (p) (r) 4 (q) (d) 

Statement (g)，which says that the equation ^4x = b has at least one solution for each b in 
W 1 , implies (n)，because Col A is precisely the set of all b such that the equation ^4x = b 
is consistent. The implications (n) (o) => (p) follow from the definitions of dimension 

and rank. If the rank of A is n, the number of columns of A, then dim Nul ^4 = 0, by the 
Rank Theorem, and so Nul ^4 = {0}. Thus (p) => (r) (q). Also, (q) implies that the 

equation Ax = 0 has only the trivial solution, which is statement (d). Since statements 

- Expanded Table for the (d) and (g) are already known to be equivalent to the statement that A is invertible, the 

SG .1 IMT 4-19 proof is complete. ■ 

We have refrained from adding to the Invertible Matrix Theorem obvious state¬ 
ments about the row space of A, because the row space is the column space of A T . 
Recall from statement (1) of the Invertible Matrix Theorem that A is invertible if and 
only if A t is invertible. Hence every statement in the Invertible Matrix Theorem can 
also be stated for A T . To do so would double the length of the theorem and produce a 
list of over 30 statements! 
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WEB 


i— NUMERICAL NOTE - 

Many algorithms discussed in this text are useful for understanding concepts 
and making simple computations by hand. However, the algorithms are often 
unsuitable for large-scale problems in real life. 

Rank determination is a good example. It would seem easy to reduce a matrix 
to echelon form and count the pivots. But unless exact arithmetic is performed 
on a matrix whose entries are specified exactly, row operations can change the 

厂 5 7" 

apparent rank of a matrix. For instance, if the value of x in the matrix ^ ^ 

is not stored exactly as 7 in a computer, then the rank may be 1 or 2, depending 
on whether the computer treats x — 7 as zero. 

In practical applications, the effective rank of a matrix A is often determined 
from the singular value decomposition of A, to be discussed in Section 7.4. This 
decomposition is also a reliable source of bases for Col A, Row A, Nul A, and 
NulA r . 


PRACTICE PROBLEMS 


The matrices below are row equivalent. 


2 

-1 

1 

-6 

8" 


"1 

-2 

-4 

3 

-2' 

1 

-2 

-4 

3 

-2 

, B = 

0 

3 

9 

-12 

12 

-7 

8 

10 

3 

-10 

0 

0 

0 

0 

0 

4 

-5 

-7 

0 

4 


0 

0 

0 

0 

0 


A 


1. Find rank ^4 and dim Nul A 

2. Find bases for Col A and Row A. 


3. What is the next step to perform to find a basis for Nul A? 

4. How many pivot columns are in a row echelon form of A T ? 


4.6 EXERCISES 


In Exercises 1-4, assume that the matrix A is row equivalent to B . 
Without calculations, list rank A and dim Nul A. Then find bases 
for Col A, Row A, and Nul A. 


1-4 9-7 

1. A = -1 2-4 1 

5-6 10 7 

"1 0-1 5" 

B = 0-2 5-6 

0 0 0 0 


2. A = 


B 


1 

2 

3 

3 

1 

0 

0 

0 


3 

6 

9 

9 

3 

0 

0 

0 


4-1 2 

6 0-3 

3 6-3 

0 9 0 

4-1 2 

1 -1 1 
0 0-5 

0 0 0 


3. A = 


B 


2 

-2 

4 

_-2 

:2 

0 

0 

0 


6-663 6 

-3 6 -3 0 -6 

9 -12 9 3 12 

3 6 3 3 -6 

6 -6 6 3 6" 

3 0 3 3 0 

0 0 0 3 0 

0 0 0 0 0 


"1 1-2 
1 2 -3 

4. A = 1-1 0 

1 -2 2 
1 -2 1 
2 1 1-2 
0 1 -1 
B = 0 0 1 

0 0 0 
0 0 0 


0 1 -2" 
0 -2 一 3 
0 1 6 
1 -3 0 

0 2 -1 _ 
0 1 -2 

0 -3 -1 

1 -13-1 
0 1 -1 

0 0 1 
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5. If a 4 x 7 matrix A has rank 3, find dim Nul A, dim Row A, 
and rank A T . 

6. If a 7 x 5 matrix A has rank 2, find dim Nul A, dim Row A, 
and rank A T . 

7. Suppose a 4 x 7 matrix A has four pivot columns. Is 
Col A = M 4 ? Is Nul A = R 3 ? Explain your answers. 

8. Suppose a 6 x 8 matrix A has four pivot columns. What is 
dim Nul A? Is Col A = R 4 ? Why or why not? 

9. If the null space of a 4 x 6 matrix A is 3-dimensional, what 
is the dimension of the column space of A? Is Col A = R 3 ? 
Why or why not? 

10. If the null space of an 8 x 7 matrix A is 5-dimensional, what 
is the dimension of the column space of A? 

11. If the null space of an 8 x 5 matrix A is 3-dimensional, what 
is the dimension of the row space of A? 

12. If the null space of a 5 x 4 matrix A is 2-dimensional, what 
is the dimension of the row space of A? 

13. If v4 is a 7 x 5 matrix, what is the largest possible rank of A? 
If v4 is a 5 x 7 matrix, what is the largest possible rank of A? 
Explain your answers. 

14. If ^4 is a 5 x 4 matrix, what is the largest possible dimension 
of the row space of A? If ^4 is a 4 x 5 matrix, what is the 
largest possible dimension of the row space of A? Explain. 

15. If ^4 is a 3 x 7 matrix, what is the smallest possible dimension 
ofNuM? 

16. If ^4 is a 7 x 5 matrix, what is the smallest possible dimension 
ofNuM? 

In Exercises 17 and 18, A is an m x n matrix. Mark each 
statement True or False. Justify each answer. 

17. a. The row space of A is the same as the column space of 

A T . 

b. If 5 is any echelon form of A, and if B has three nonzero 
rows, then the first three rows of A form a basis for 
Row A. 

c. The dimensions of the row space and the column space 
of A are the same, even if A is not square. 

d. The sum of the dimensions of the row space and the null 
space of A equals the number of rows in A. 

e. On a computer, row operations can change the apparent 
rank of a matrix. 

18. a. If B is any echelon form of A, then the pivot columns of 

B form a basis for the column space of A. 

b. Row operations preserve the linear dependence relations 
among the rows of A. 

c. The dimension of the null space of A is the number of 
columns of A that are not pivot columns. 

d. The row space of A T is the same as the column space of 
A. 


e. If A and B are row equivalent, then their row spaces are 
the same. 

19. Suppose the solutions of a homogeneous system of five linear 
equations in six unknowns are all multiples of one nonzero 
solution. Will the system necessarily have a solution for 
every possible choice of constants on the right sides of the 
equations? Explain. 

20. Suppose a nonhomogeneous system of six linear equations 
in eight unknowns has a solution, with two free variables. Is 
it possible to change some constants on the equations’ right 
sides to make the new system inconsistent? Explain. 

21. Suppose a nonhomogeneous system of nine linear equations 
in ten unknowns has a solution for all possible constants on 
the right sides of the equations. Is it possible to find two 
nonzero solutions of the associated homogeneous system that 
are not multiples of each other? Discuss. 

22. Is is possible that all solutions of a homogeneous system of 
ten linear equations in twelve variables are multiples of one 
fixed nonzero solution? Discuss. 

23. A homogeneous system of twelve linear equations in eight 
unknowns has two fixed solutions that are not multiples of 
each other, and all other solutions are linear combinations of 
these two solutions. Can the set of all solutions be described 
with fewer than twelve homogeneous linear equations? If so, 
how many? Discuss. 

24. Is it possible for a nonhomogeneous system of seven equa¬ 
tions in six unknowns to have a unique solution for some 
right-hand side of constants? Is it possible for such a system 
to have a unique solution for every right-hand side? Explain. 

25. A scientist solves a nonhomogeneous system of ten linear 
equations in twelve unknowns and finds that three of the 
unknowns are free variables. Can the scientist be certain 
that, if the right sides of the equations are changed, the new 
nonhomogeneous system will have a solution? Discuss. 

26. In statistical theory, a common requirement is that a matrix 
be of full rank. That is, the rank should be as large as 
possible. Explain why an m x « matrix with more rows than 
columns has full rank if and only if its columns are linearly 
independent. 

Exercises 27-29 concern an m x n matrix A and what are often 

called the fundamental subspaces determined by A. 

27. Which of the subspaces Row A, Col A, Nul A, Row A T , 
Col A T , and Nul A T are in R m and which are in R ”？ How 
many distinct subspaces are in this list? 

2$. Justify the following equalities: 

a. dim Row A + dim Nul A = n Number of columns of A 

b. dim Col A + dim Nul A 7 = m Number of rows of A 

29. Use Exercise 28 to explain why the equation Ax = b has a 
solution for all b in M. m if and only if the equation A T x = 0 
has only the trivial solution. 
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.Find y in M 3 such that 


30. Suppose A is my. n and b is in R m . What has to be true 
about the two numbers rank [ A b ] and rank A in order for 
the equation Ax = b to be consistent? 

Rank 1 matrices are important in some computer algorithms and 
several theoretical contexts, including the singular value decom¬ 
position in Chapter 7. It can be shown that an m x n matrix A 
has rank 1 if and only if it is an outer product; that is, A = uv r 
for some u in R m and v in R n . Exercises 31-33 suggest why this 
property is true. 


31. Verify that rank uv r < 1 if u 


32. Letu : 


33. Let A be any 2x3 matrix such that rank A = 1, let u be the 
first column of A, and suppose u ^ 0. Explain why there 
is a vector y in R 3 such that A = uv r . How could this 
construction be modified if the first column of A were zero? 

34. Let A be an m x n matrix of rank r > 0 and let U be an eche¬ 
lon form of A. Explain why there exists an invertible matrix 
E such that A = EU, and use this factorization to write A 
as the sum of r rank 1 matrices. [Hint: See Theorem 10 in 
Section 2.4.] 


35. [M] Let A 


3 -3 -7 


-5 5 
2 8 
-4 8 
9 3 


a. Construct matrices C and N whose columns are bases for 
Col A and Nul A, respectively, and construct a matrix R 
whose rows form a basis for Row A. 

b. Construct a matrix M whose columns form a ba¬ 
sis for Nul A T , form the matrices S = [ R T N ] and 
T = [C M ], and explain why S and T should be 
square. Verify that both S and T are invertible. 

36. [M] Repeat Exercise 35 for a random integer-valued 6x7 
matrix A whose rank is at most 4. One way to make A 
is to create a random integer-valued 6x4 matrix J and a 
random integer-valued 4x7 matrix K, and set A = JK. 
(See Supplementary Exercise 12 at the end of the chapter; 
and see the Study Guide for matrix-generating programs.) 

37. [M] Let A be the matrix in Exercise 35. Construct a matrix 
C whose columns are the pivot columns of A, and construct 
a matrix R whose rows are the nonzero rows of the reduced 
echelon form of A. Compute CR, and discuss what you see. 

38. [M] Repeat Exercise 37 for three random integer-valued 
5x7 matrices A whose ranks are 5, 4, and 3. Make a 
conjecture about how CR is related to A for any matrix A. 
Prove your conjecture. 


SOLUTIONS TO PRACTICE PROBLEMS 


1. A has two pivot columns, so rank ^4 = 2. Since A has 5 columns altogether, 
dimNuM = 5-2 = 3. 

2. The pivot columns of A are the first two columns. So a basis for Col A is 


{ai,a 2 } 



2" 


"-1" 

、 


1 


-2 



-7 


8 



4 


-5 



Major Review of Key 
Concepts 4-22 


The nonzero rows of B form a basis for Row A, namely, {(1, —2, —4, 3, —2), 
(0, 3, 9, —12,12)}. In this particular example, it happens that any two rows of A 
form a basis for the row space, because the row space is two-dimensional and none 
of the rows of ^4 is a multiple of another row. In general, the nonzero rows of an 
echelon form of A should be used as a basis for Row A, not the rows of A itself. 

3. For Nul^4, the next step is to perform row operations on B to obtain the reduced 
echelon form of A. 

4. Rank A r = rank A, by the Rank Theorem, because Col A r = Row A. So has 
two pivot positions. 


2 


a 

-3 

5 

and y = 

b 

c 


SG 


4 8 

3 6 
_ - 

1 - 2 






















4.7 Change of Basis 239 


4.7 CHANGE OF BASIS 


When a basis B is chosen for an /i-dimensional vector space V, the associated coordinate 
mapping onto provides a coordinate system for V. Each x in K is identified uniquely 
by its B-coordinate vector [x ]^. 1 

In some applications, a problem is described initially using a basis B, but the 
problem’s solution is aided by changing S to a new basis C. (Examples will be given in 
Chapters 5 and 7.) Each vector is assigned a new C-coordinate vector. In this section, 
we study how [x] c and [xare related for each x in V. 

To visualize the problem, consider the two coordinate systems in Fig. 1. In Fig. 1 (a), 
x = 3b i + b 2 , while in Fig. 1(b), the same x is shown as x = 6c i + 4c〗. That is, 

and [x] c = 4 

Our problem is to find the connection between the two coordinate vectors. Example 1 
shows how to do this, provided we know how bi and b 2 are formed from Ci and C 2 . 


[ x ] s = 


1 


1 

議 



FIGURE 1 Two coordinate systems for the same vector space. 


EXAMPLE 1 Consider two bases B = {bi, b 2 } and C = {ci, C 2 } for a vector space 
V, such that 

bi = 4ci + c 2 and b 2 = -6ci + c 2 (1) 


Suppose 


x = 3bi + \)2 


⑵ 


That is, suppose [x]^ = ^ . Find [x ] c ， 

SOLUTION Apply the coordinate mapping determined by C to x in (2). Since the 
coordinate mapping is a linear transformation, 


[x] c = [3bi +b 2 ] c 

= 3 [ b i ] c + [ b 2] c 

We can write this vector equation as a matrix equation, using the vectors in the linear 
combination as the columns of a matrix: 

[x] c = [[bj ] c [b 2 ] c ][;] (3) 


1 Think of [ x as a “name” for x that lists the weights used to build x as a linear combination of the basis 
vectors in B. 
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This formula gives [x] c , once we know the columns of the matrix. From (1), 


4 


and [b 2 ] r 


[ b i]c = 

Thus (3) provides the solution: 

[ x ]c 

The C-coordinates of x match those of the x in Fig. 1. 


"4 -6" 

"3" 


"6" 

1 1 

1 


4 


■ 


The argument used to derive formula (3) can be generalized to yield the following 
result. (See Exercises 15 and 16.) 


THEOREM 15 Let B = {b\,... ,b n } and C = {ci,...,c n } be bases of a vector space V. Then 
there is a unique n x n matrix such that 

[ X ]C = c£-b\. x \b ⑷ 

The columns of els are the C-coordinate vectors of the vectors in the basis B. 
That is, 

C Zb = [ I b l]c [^ 2 ]c - - - [K]c ] ⑶ 


The matrix C ^_ B in Theorem 15 is called the change-of-coordinates matrix from 
15 to C. Multiplication by converts ^-coordinates into C-coordinates. 2 Figure 2 
illustrates the change-of-coordinates equation (4). 

y 


u n 


[x] r 



[], 


multiplication 


by els 


[T 


[ x ] 石 


FIGURE 2 Two coordinate systems for V. 


The columns of are linearly independent because they are the coordinate 
vectors of the linearly independent set B. (See Exercise 25 in Section 4.4.) Since 
is square, it must be invertible, by the Invertible Matrix Theorem. Left-multiplying both 
sides of equation (4) by yields 


(c£ e ) _1 [x] c = [x] g 


2 To remember how to construct the matrix, think of [ x as a linear combination of the columns of 

cU The matrix-vector product is a C-coordinate vector, so the columns of c-£-3 should be C-coordinate 
vectors, too. 
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Thus is the matrix that converts C-coordinates into ^-coordinates. That is, 




elc 


⑹ 


Change of Basis in R n 

If B = {b\,, b„} and £ is the standard basis {ei,..., e w } in W 1 , then [bi]^ = bi, 
and likewise for the other vectors in B. In this case, is the same as the change-of- 
coordinates matrix Ps introduced in Section 4.4, namely, 

尸谷 =[bi b 〗 ...b„ ] 

To change coordinates between two nonstandard bases in W 1 , we need Theorem 15. 
The theorem shows that to solve the change-of-basis problem, we need the coordinate 
vectors of the old basis relative to the new basis. 


EXAMPLE 2 Letbi 


-9 


b 2 


,Ci 


-4 


， c 2 


-5 


,and con¬ 


sider the bases for R 2 given by B = {bi,b2} and C = {ci,C2}. Find the change-of- 
coordinates matrix from B to C. 

SOLUTION The matrix els involves the C-coordinate vectors of bi and b〗.Let 


[bil 


xi 


and [ b 2 ] c 
[ci c 2 ] 


Ji 

yi 


Xl 

X2 


. Then, by definition, 
bi and [q c 2 ] 


Ji 

yi 


b 2 


To solve both systems simultaneously, augment the coefficient matrix with bi and b 2 , 
and row reduce: 


[ci C 2 ； bi b 2 ]= 

Thus 

[ b i]c 

The desired change-of-coordinates matrix is therefore 

els = [I b i ] c [ b 2 ]c] 


1 3 

-9 

-5' 


"1 0 

6 

4" 

-4 -5 

1 

-1 


0 1 

-5 

-3_ 


⑺ 


and [ b 2 ] 


6 4 

-5-3 


■ 


Observe that the matrix in Example 2 already appeared in (7). This is not 
surprising because the first column of results from row reducing [ci C 2 ； bi ] to 
[I \ [bi ] c ], and similarly for the second column of Thus 


[ci c 2 ； bi b 2 ] 


[^ i cZb . 


An analogous procedure works for finding the change-of-coordinates matrix between 
any two bases in W 1 . 
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EXAMPLE 3 Letb, 


-3 


b? 


,Ci 


sider the bases for R 2 given by S = {bi ， b〗} and C 


-7 
9 

{ci,c 2 }. 


， C 2 


,and con- 


a. Find the change_of-coordinates matrix from C to B. 

b. Find the change-of-coordinates matrix from B to C. 

SOLUTION 


a. Notice that is needed rather than c f B , and compute 

[bi b 2 ； Ci 


C2 


1 _ 

1 

-2 | 

-7 -5' 


"1 0 j 5 3" 

J - 

-3 

4 

9 1 • 


0 16 4 


So 


sic 


5 3 

6 4 


b. By part (a) and property (6) above (with B and C interchanged), 


els 


1 

'4-3" 


2 -3/2" 

2 

-6 5 


_-3 5/2 _ 


■ 


Another description of the change-of-coordinates matrix uses the change-of- 
coordinate matrices Ps and Pc that convert S-coordinates and C-coordinates, respec¬ 
tively, into standard coordinates. Recall that for each x in W l , 

Pb[Ab = x, 尸 cMc = x, and [x] c = P^ l x 


Thus 


[x] c = Pc ! x = 


In W 1 , the change-of-coordinates matrix els 


Pc'PbMb 

may be computed as Ps- Actually, 


for matrices larger than 2 x 2, an algorithm analogous to the one in Example 3 is faster 
than computing P^ 1 and then P^ x Ps- See Exercise 12 in Section 2.2. 


PRACTICE PROBLEMS 


1. Let T = {U 2 } and Q = {g 1 ,g 2 } be bases for a vector space V, and let P be a 
matrix whose columns are [fi ]g and [fe ]g. Which of the following equations is 
satisfied by P for all y in VI 

(i) [y]j, = P[y]g (ii) [y]g = P[y]^ 

2. Let B and C be as in Example 1. Use the results of that example to find the change- 
of-coordinates matrix from C to B. 


4.7 EXERCISES 


1. Let B = {bi, bi} and C = {ci, C 2 } be bases for a vector space 
V, and suppose bi = 6ci — 2c2 and b 2 = 9ci — 4c2. 

a. Find the change-of-coordinates matrix from B to C. 

b. Find [ x ] c for x = —3bi + 2b2- Use part ⑻. 


2. Let B = {bi, b 2 ) and C = {ci, C 2 } be bases for a vector space 
V, and suppose bi = —2ci + 4c 2 and b 2 = 3ci — 6c 2 . 

a. Find the change-of-coordinates matrix from B to C. 

b. Find [x ] c for x = 2bi + 3b 2 . 
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3. Let W = {ui ， U 2 } and W = {wi, W 2 } be bases for V, and let 
P be a matrix whose columns are [ Ui ] w and [U 2 ]w. Which 
of the following equations is satisfied by P for all x in V? 

(i) [x] w = P[x] w (ii) [x] w = P[x] u 

4. Let A = {ai,a 2 ,a 3 } and V = {di,d 2 ,ds} be bases for V, 

and let P = [ [di ] 乂 [^ 3 ]^ ]• Which of the follow¬ 

ing equations is satisfied by P for all x in VI 

(i) [x]^ = P[n] v (ii) [x]^ = P[x] A 

5. Let = {ai,a 2 ,a 3 } and ^ = {bi,b 2 ,b 3 } be bases 
for a vector space V, and suppose ai = 4b 1 — b 2 , 
&2 =: — bi + b2 + b〗，and a 〗 = b2 _ 2b3. 

a. Find the change-of-coordinates matrix from A to B. 

b. Find [x] fi for x = 3ai + 4 a 2 + a〗. 

6 . Let T> = {di,d 2 ,d 3 } and T — {fi,f 2 ,f 3 } be bases for 
a vector space V, and suppose f\ = 2d\ — d 2 + d 〗， 
f 2 = 3d2 + d 3 , and f 3 = —3di + 2d^. 

a. Find the change-of-coordinates matrix from T to T>. 

b. Find [ x ^ for x = l — 2f2 + 2f3. 

In Exercises 7-10, let B = {bi, b 2 } and C = {ci, C 2 } be bases for 
R 2 . In each exercise, find the change-of-coordinates matrix from 
13 to C and the change-of-coordinates matrix from C to B. 



7 



-3 



1 


-2 

7. bi = 

5 

,b 2 = 

-1 

, Ci = 

-5 

,c 2 = 

2 


-1 






1 


1 

8. bi = 

8 


,= 

-1 


, Ci = 

2 

,c 2 = 

1 


9. bi = 

10. bi = 


4" 


'8" 


'2" 


'-2" 

4 

,b 2 = 

_4_ 

， Ci = 

_2_ 

,c 2 = 

_ 2_ 


6] 


4 


4 


3 

12 

， b 2 = 

2 

, Ci = 

2 

, c 2 = 

9 


In Exercises 11 and 12, B and C are bases for a vector space V. 
Mark each statement True or False. Justify each answer. 


11 . a. The columns of the change-of-coordinates matrix els 

are ^-coordinate vectors of the vectors in C. 
b. If K = R” and C is the standard basis for V, then els 
is the same as the change-of-coordinates matrix Pjs intro¬ 
duced in Section 4.4. 

12 . a. The columns of cl B are linearly independent. 

b. If V = R 2 , B = {bi,b 2 }, and C = { 01 , 02 }, then row 
reduction of [ci C 2 bi b 2 ] to [ / P ] produces a 
matrix P that satisfies [x]^ = P[x] c for all x in V. 

13. In P 2 , find the change-of-coordinates matrix from the basis 
B = {1 — 2t -h t 2 ,3 — 5t + 4t 2 ,2t + 3t 2 } to the standard 
basis C = {1, t, t 2 }. Then find the /3-coordinate vector for 
-1 + 2r. 

14. In P 2 , find the change-of-coordinates matrix from the ba¬ 
sis ^ = {1 — 3t 2 ,2 - 1 — 5t 2 ,1 + 2t} to the standard basis. 
Then write t 2 as a linear combination of the polynomials in 
B. 


Exercises 15 and 16 provide a proof of Theorem 15. Fill in a 
justification for each step. 

15. Given y in K, there exist scalars X\,... ,x n , such that 

v = Xibi + + ■•• + x n b n 

because (a)_. Apply the coordinate mapping deter¬ 

mined by the basis C, and obtain 

[v] c = xi[bi] c + x 2 [b 2 ]c + ••• + x n [b n ]c 

because (b)_. This equation maybe written in the form 

"^1 ~ 

[v] c = [[bi ] c [b 2 ] c … [b„] c ] : ⑻ 

_ _ 

by the definition of (c)_. This shows that the matrix 

C ^_ E shown in (5) satisfies [\] c = c l B [v]g for each y in V, 
because the vector on the right side of (8) is (d)_. 

16. Suppose Q is any matrix such that 

[y] c = Q[\]b for each y in K (9) 

Set y = bi in (9). Then (9) shows that [b\]c is the first 

column of Q because (a)_. Similarly, fork = 2,... ,n, 

the 众 th column of Q is (b)_because (c)_. This 

shows that the matrix c l B defined by (5) in Theorem 15 is 
the only matrix that satisfies condition (4). 

17. [M] Let B = {xo,..., X6} and C = {y 0 , … ， y 6 }, where Xk is 
the function cos^ t and y k is the function coskt. Exercise 34 
in Section 4.5 showed that both B and C are bases for the 
vector space H = Span {xo,... ,X6}. 

a. Set P = [ [y 0 ] B … [y 6 ] 6 ], and calculate P~ l . 

b. Explain why the columns of P~ l are the C-coordinate 
vectors of x 。， … ， X6. Then use these coordinate vectors 
to write trigonometric identities that express powers of 
cos t in terms of the functions in C. 

See the Study Guide. 

18. [M] (Calculus required) 3 Recall from calculus that integrals 
such as 

J (5 cos 3 1 — 6 cos 4 1 5 cos 5 1 — 12 cos 6 1) dt (10) 

are tedious to compute. (The usual method is to apply inte¬ 
gration by parts repeatedly and use the half-angle formula.) 
Use the matrix P or P~ l from Exercise 17 to transform (10); 
then compute the integral. 


3 The idea for Exercises 17 and 18 and five related exercises in earlier 

sections came from a paper by Jack W. Rogers, Jr., of Auburn University, 

presented at a meeting of the International Linear Algebra Society, 

August 1995. See “Applications of Linear Algebra in Calculus,” 
American Mathematical Monthly 104 (1), 1997. 
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19. [M] Let 



a. Find a basis {ui ， 112 , 113 } for R 3 such that P is the 
change-of-coordinates matrix from {ui ， 112 , 113 } to the 
basis {vi, V2, V3}. [Hint: What do the columns of els 
represent?] 


b. Find a basis {wi, W2, W3} forR 3 such that P is the change- 
of-coordinates matrix from {vi, V2, V3} to {wi, w 2 , w 3 }. 

20. Let B = {bi,b 2 }, C = {ci,c 2 }, and V = {di,d 2 } be bases 
for a two-dimensional vector space. 

a. Write an equation that relates the matrices U 
and .Justify your result. 

b. [M] Use a matrix program either to help you find the 
equation or to check the equation you write. Work with 
three bases for R 2 . (See Exercises 7-10.) 


SOLUTIONS TO PRACTICE PROBLEMS 


1. Since the columns of P are ^-coordinate vectors, a vector of the form Px must be 
a ^-coordinate vector. Thus P satisfies equation (ii). 

2. The coordinate vectors found in Example 1 show that 


c£-t3 = [ [ bl ] c [ ^2 ] c ]= 


Hence 


b£-c = (c^b) 


10 


1 6 ' 


.1 . 6 ' 

-1 4 


-.1 .4 


4.8 APPLICATIONS TO DIFFERENCE EQUATIONS 


Now that powerful computers are widely available, more and more scientific and 
engineering problems are being treated in a way that uses discrete, or digital, data rather 
than continuous data. Difference equations are often the appropriate tool to analyze 
such data. Even when a differential equation is used to model a continuous process, a 
numerical solution is often produced from a related difference equation. 

This section highlights some fundamental properties of linear difference equations 
that are best explained using linear algebra. 

Discrete-Time Signals 

The vector space S of discrete-time signals was introduced in Section 4.1. A signal in 
S is a function defined only on the integers and is visualized as a sequence of numbers, 
say, {yk}- Figure 1 shows three typical signals whose general terms are (.7) k ， l k , and 
(—l) k ， respectively. 



FIGURE 1 Three signals in S. 
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Digital signals obviously arise in electrical and control systems engineering, but 
discrete-data sequences are also generated in biology, physics, economics, demography, 
and many other areas, wherever a process is measured, or sampled, at discrete time 
intervals. When a process begins at a specific time, it is sometimes convenient to write 
a signal as a sequence of the form (yo, Ji ， J 2 , • • •）. The terms yk for k < 0 either are 
assumed to be zero or are simply omitted. 


EXAMPLE 1 The crystal-clear sounds from a compact disc player are produced 
from music that has been sampled at the rate of 44,100 times per second. See Fig. 2. At 
each measurement, the amplitude of the music signal is recorded as a number, say, yk. 
The original music is composed of many different sounds of varying frequencies, yet 
the sequence {yk} contains enough information to reproduce all the frequencies in the 
sound up to about 20,000 cycles per second, higher than the human ear can sense. ■ 





Linear Independence in the Space S of Signals 

To simplify notation, we consider a set of only three signals in S, say, {u^}, {v^}, and 
{wk}. They are linearly independent precisely when the equation 

C\Uk + C 2 Vk + c^Wk = 0 for all k ( 1 ) 


implies that c\ = C 2 = c->, = 0. The phrase “for all k” means for all integers — positive, 
negative, and zero. One could also consider signals that start with k = 0, for example, 
in which case, “for all k” would mean for all integers k > 0 . 

Suppose ci, C 2 , C 3 satisfy (1). Then equation (1) holds for any three consecutive 
values of k, say, k, k l, and k -\- 2. Thus (1) implies that 


and 


CiU k+ i + c 2 v k ^i + c 3 w k+ i = 0 for all k 
c\Uk +2 + c 2 Vk -^2 + C 3 Wk +2 = 0 for all k 


SG The Casorati Test 4-30 


Hence c\, C 2 , C 3 satisfy 


~u k 

Vk 


~ Cl" 


"0" 


Vk+\ 


C 2 

= 

0 

_ Uk -\-2 

Vk 十 2 

U)k+ 2 _ 

_C3_ 


0 


for all k 


⑵ 


The coefficient matrix in this system is called the Casorati matrix of the signals, and 
the determinant of the matrix is called the Casoratian of {uk}, {vk}, and {wk}. If 
the Casorati matrix is invertible for at least one value of k, then (2) will imply that 
Ci = C 2 = C 3 = 0 , which will prove that the three signals are linearly independent. 
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EXAMPLE 2 Verify that l k , (—2) k , and 3 k are linearly independent signals. 
SOLUTION The Casorati matrix is 



The signals 1^, (—2)^, and 3*. 


Row operations can show fairly easily that this matrix is always invertible. However, it 
is faster to substitute a value for A:—say, k = 0—and row reduce the numerical matrix: 


"1 

1 

r 


" 1 

1 

r 


"1 

1 

r 

1 

-2 

3 

〜 

0 

-3 

2 

〜 

0 

-3 

2 

1 

4 

9 


0 

3 

8 


0 

0 

10 


The Casorati matrix is invertible for k 
independent. 


0. So l k , (~2) k , and 3 k are linearly 


■ 


If a Casorati matrix is not invertible, the associated signals being tested may or 
may not be linearly dependent. (See Exercise 33.) However, it can be shown that if 
the signals are all solutions of the same homogeneous difference equation (described 
below), then either the Casorati matrix is invertible for all k and the signals are linearly 
independent, or else the Casorati matrix is not invertible for all k and the signals are 
linearly dependent. A nice proof using linear transformations is in the Study Guide. 

Linear Difference Equations 

Given scalars ao,..., a n , with ao and a n nonzero, and given a signal {zk}, the equation 
aoyk+n + a x y k+n -x +- h a n - X y k+x + a n y k = Zk for all k (3) 

is called a linear difference equation (or linear recurrence relation) of order n. For 

simplicity, ao is often taken equal to 1. If {zk} is the zero sequence, the equation is 
homogeneous; otherwise, the equation is nonhomogeneous. 

EXAMPLE 3 In digital signal processing, a difference equation such as (3) de¬ 
scribes a linear filter, and ao,... ,a n are called the filter coefficients. If {yk} is treated 
as the input and {zk) as the output, then the solutions of the associated homogeneous 
equation are the signals that are filtered out and transformed into the zero signal. Let us 
feed two different signals into the filter 

•35j( +2 + .5y k +i + .35y k = Zk 

Here .35 is an abbreviation for V2/4. The first signal is created by sampling the 
continuous signal y = co^{nt/A) at integer values of t, as in Fig. 3(a). The discrete 
signal is 

{yk} = {•••, cos(0),cos(7r/4),cos(27r/4),cos(3;r/4),...} 

For simplicity, write 士 .7 in place of =b\/2/2, so that 

{yk} = 1, .7, 0, —.7, —1, —.7, 0, .7, 1, .7, ()，•••} 


k = 0 


众众众 

3 3 3 


)) 
2 2 2 
I I I 

/IV /IV /IV 


IX 1A 


Table 1 shows a calculation of the output sequence {za ：}，where .35(.7) is an abbreviation 
for (\/2/4)(\/2/2) = .25. The output is {yk}, shifted by one term. 
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FIGURE 3 Discrete signals with different frequencies. 


TABLE 1 Computing the Output of a Filter 


k 

yk 

yk+i 

y/c +2 

.35^ 

+ + .35^+2 = 

Zk 

0 

l 

.7 

0 

.35(1) 

+ .5(.7) + .35(0) = 

.7 

1 

.7 

0 

-.7 

.35(.7) 

+ .5(0) + .35(-.7)= 

0 

2 

0 

-.7 

-1 

.35(0) 

+ .5( - .7) + .35(-1) = 

-.7 

3 

-.7 

-1 

-.7 

•35(-.7) + .5(-1) + .35(-.7)= 

-1 

4 

-1 

-.7 

0 

.35(-1) 

+ .5( - .7) + .35(0) = 

-.7 

5 

-.7 

0 

.7 

•35( - .7) + .5(0) + .35(.7)= 

0 


A different input signal is produced from the higher frequency signal y = 
cos(3tt^/ 4), shown in Fig. 3(b). Sampling at the same rate as before produces a new 
input sequence: 

{^k} = 1, —.7, 0, .7, —1, .7, 0, —.7, 1, —.7, 0,...} 

k = 0 

When {\Vk} is fed into the filter, the output is the zero sequence. The filter, called a 
low-pass filter, lets {y^} pass through, but stops the higher frequency {w^}. ■ 

In many applications, a sequence {zk} is specified for the right side of a difference 
equation (3), and a {y^} that satisfies (3) is called a solution of the equation. The next 
example shows how to find solutions for a homogeneous equation. 

EXAMPLE 4 Solutions of a homogeneous difference equation often have the form 
yic = r k for some r. Find some solutions of the equation 

yk +3 - 2^+2 - 5y k +\ + = 0 for all k ⑷ 

SOLUTION Substitute r k for in the equation and factor the left side: 

r k+3 - 2r k+1 - 5r k+x 6r k = 0 (5) 

r k (r 3 — 2r 2 — 5r + 6) = 0 

r k (r — l)(r + 2)(r — 3) = 0 (6) 

Since (5) is equivalent to (6), r k satisfies the difference equation (4) if and only if r 
satisfies (6). Thus l k ， (—2) k , and 3 k are all solutions of (4). For instance, to verify that 
3 k is a solution of (4), compute 

3 k+3 - 2 - 3 fc+2 - 5 • 3 k+l +6-3^ 

= 3^(27-18-15 + 6) = 0 for all k ■ 
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In general, a nonzero signal r k satisfies the homogeneous difference equation 

yk+n + a\yk+ n -\ +- 1- a n -\yk+\ + a„y k = 0 for all k 

if and only if r is a root of the auxiliary equation 

y n + ci\r n i + • • • + ci n —\r + G n = 0 

We will not consider the case in which r is a repeated root of the auxiliary equation. 
When the auxiliary equation has a complex root ，the difference equation has solutions 
of the form s k cos kco and s k sin ka )， for constants s and co. This happened in Example 3. 


Solution Sets of Linear Difference Equations 

Given a\,..., a n , consider the mapping r : S ^ S that transforms a signal {^} into a 
signal {wk} given by 

= yk+n + aiy k+n -x + ••• + a n -iy k+l + a n y k 

It is readily checked that T is a linear transformation. This implies that the solution set 
of the homogeneous equation 

yk+n + aiy k+n -i H - + a n -iy k ^-i + a n y k = 0 for all k 

is the kernel of T (the set of signals that T maps into the zero signal), and hence the 
solution set is a sub space of S. Any linear combination of solutions is again a solution. 

The next theorem, a simple but basic result, will lead to more information about the 
solution sets of difference equations. 


THEOREM 16 a n ^0 and if {zk} is given, the equation 

yk+n + aiy k+n -i H - h a n -iy k +i + a n y k = Zk for all k (7) 

has a unique solution whenever jo, … ， yn-\ are specified. 


PROOF If jo,--. , y n -\ are specified, use (7) to define 

yn = Zo - [ +- h a„-iji + a n y 0 ] 

And now that y\,... ,y n are specified, use (7) to define y n -\-i- In general, use the 
recurrence relation 

y n +k = Zk — [ciiyk+n-i H - + a n y k ] (8) 

to define y n -\-k for k > 0. To define yk for k < 0, use the recurrence relation 

yk = — Z，k - [ yk-\-n + ^\yk-\-n-\ + ••• + Cl n -\yk-\-\ ] (9) 

Cl n 

This produces a signal that satisfies (7). Conversely, any signal that satisfies (7) for all 
k certainly satisfies (8) and (9), so the solution of (7) is unique. ■ 


THEOREM 17 The set H of all solutions of the /zth-order homogeneous linear difference equation 

yk+n + aiy k+n -i H - + a n -iy k+ i + a n y k = 0 for all k (10) 

is an «-dimensional vector space. 
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PROOF As was pointed out earlier, // is a subspace of S because H is the kernel 
of a linear transformation. For {y^} in H ， let F{yk} be the vector in R ,2 given by 
(Jo, Ji,..., It is readily verified that F : // ^ is a linear transformation. 

Given any vector (jo, Ji,..., y n -\) in M /z , Theorem 16 says that there is a unique 
signal {yk} in H such that F{yk} = Cvo, Ji,..., 3 ^- 1 ). This means that F is a one- 
to-one linear transformation of H onto that is, F is an isomorphism. Thus 
dim H = dim]R n = n. (See Exercise 32 in Section 4.5.) ■ 

EXAMPLE 5 Find a basis for the set of all solutions to the difference equation 

yk -^3 - 2^+2 - 5jfc+i -\-6y k = 0 for all k 

SOLUTION Our work in linear algebra really pays off now! We know from Examples 2 
and 4 that 1 人， (—2) k , and 3 k are linearly independent solutions. In general, it can be 
difficult to verify directly that a set of signals spans the solution space. But that is no 
problem here because of two key theorems—Theorem 17, which shows that the solution 
space is exactly three-dimensional, and the Basis Theorem in Section 4.5, which says 
that a linearly independent set of n vectors in an /?-dimensional space is automatically 
a basis. So 1 气 (—2) k , and 3 k form a basis for the solution space. ■ 

The standard way to describe the “general solution” of the difference equation (10) 
is to exhibit a basis for the subspace of all solutions. Such a basis is usually called a 
fundamental set of solutions of (10). In practice, if you can find n linearly independent 
signals that satisfy (10), they will automatically span the «-dimensional solution space, 
as explained in Example 5. 


Nonhomogeneous Equations 

The general solution of the nonhomogeneous difference equation 

yk+n + Ji+n -1 +- h a„-iy k+ i + a n y k = Zk for all k (11) 

can be written as one particular solution of (11) plus an arbitrary linear combination of 
a fundamental set of solutions of the corresponding homogeneous equation (10). This 
fact is analogous to the result in Section 1.5 showing that the solution sets of ylx = b 
and ^4x = 0 are parallel. Both results have the same explanation: The mapping x Ax 
is linear, and the mapping that transforms the signal {y^} into the signal {zk} in (11) is 
linear. See Exercise 35. 

EXAMPLE 6 Verify that the signal yic = k 2 satisfies the difference equation 

yk +2 - 4 外 +1 + 3 凡 =-4^ for all k (12) 

Then find a description of all solutions of this equation. 

SOLUTION Substitute k 2 for on the left side of (12): 

(fe + 2) 2 -4(k + l) 2 + 3k 2 

=(k 2 + 4* + 4) - 4(k 2 + 2A: + 1) + 3k 2 
=—4k 

So k 2 is indeed a solution of (12). The next step is to solve the homogeneous equation 

yk +2 - 4 外 +1 + 3 外 = 0 (13) 

The auxiliary equation is 


r 2 — 4r + 3 = (r — l)(r — 3) = 0 
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FIGURE 4 

Solution sets of difference 
equations (12) and (13). 


The roots are r = 1,3. So two solutions of the homogeneous difference equation are 1^ 
and 3 k . They are obviously not multiples of each other, so they are linearly independent 
signals. By Theorem 17, the solution space is two-dimensional, so 3 k and 1 k forma basis 
for the set of solutions of equation (13). Translating that set by a particular solution of 
the nonhomogeneous equation (12), we obtain the general solution of (12): 

-J- Ci 1^ -|- ， or + Ci + C 2 ^ 

Figure 4 gives a geometric visualization of the two solution sets. Each point in the figure 
corresponds to one signal in S. ■ 


Reduction to Systems of First-Order Equations 

A modern way to study a homogeneous «th-order linear difference equation is to replace 
it by an equivalent system of first-order difference equations, written in the form 

x^+i = Ax/c for all k 

where the vectors are in and ^4 is an n x n matrix. 

A simple example of such a (vector-valued) difference equation was already studied 
in Section 1.10. Further examples will be covered in Sections 4.9 and 5.6. 


EXAMPLE 7 Write the following difference equation as a first-order system: 

yk +3 - 2^+2 - 5y k+ i -\-6y k = 0 for all k 
SOLUTION For each k, set 


Xk 


yk 

yk+\ 

yk+2 


The difference equation says that yk+?> = —6>^ + 5^+1 + 2^+2, so 


Xk+l 


~ yk+i ~ 
yk+i 

— 

o + yk+\ + o 
0+0 + yk-\-i 

— 

0 1 

0 0 

0 一 

1 

yk 

yk+i 

_yk+3_ 


_—^yk + 5 外 +1 + 2^+2 _ 


-6 5 

2 

_ yk+2_ 


That is, 


x^ + i = Axk for all k, where A 


0 1 0 

0 0 1 

-6 5 2 


■ 


In general, the equation 

yk+n + Cl\yk+n-\ + • • • + + a n yk 

can be rewritten as = Ax^ for all k, where 


Xk 


0 for all k 




0 

1 

0 . 

.. 0 " 

yk 


0 

0 

1 

0 

yk+\ 

, 4 = 

0 

0 

0 

1 

_ yk-\-n—\ 








_ —a n 

—^n-\ 

—a n -2 • 

• • — U\ 
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PRACTICE PROBLEM 

It can be shown that the signals 2 k , 3 k sin 警 ， and 3^ cos 警 are solutions of 

yk+3 — ^yk+i + ^yk+\ — 18 外 = o 

Show that these signals forma basis for the set of all solutions of the difference equation. 


4.8 EXERCISES 

Verify that the signals in Exercises 1 and 2 are solutions of the 
accompanying difference equation. 

1. (一 4) 众 ; yk-\-i + 2 外 +1 — 8 外 = 0 

2. 5 k ,(-5) k ; y k+2 - 25y k = 0 

Show that the signals in Exercises 3-6 forma basis for the solution 
set of the accompanying difference equation. 

3. The signals and equation in Exercise 1 

4. The signals and equation in Exercise 2 

5. (—2)^, k(—2) k \ yk -\-2 + 4 外 +1 + 4 外 = 0 

6. \ k cos (^), 4^ sin (f); y k+2 + 16y k = 0 

In Exercises 7-12, assume the signals listed are solutions of 
the given difference equation. Do the signals form a basis for 
the solution space of the equation? Justify your answers using 
appropriate theorems. 

7. l k , 2 k , (—2) k \ yk-\-3 — yk+2 ~~ + 4 外 = 0 

8. (—1)^, 2 k , 3 k ; yk-\-3 ~ 4 外 +2 + l^+i + = 0 

9. 2 k ,5 k cos(ff),5 k sin(^); 

^+3 - 2y k+2 + 25y k+ i - 50y k = 0 

10. (—2)^, k(—2) k , 3 k ; yk-\-3 + yk+2 — 8 外 +1 — 12 外 = 0 

11. (-1)^, 2 k \ - 3y fc+ 2 + = 0 

12. 3 fc , (-2) 勹 ^+4 - 13^ +2 + 36y k = 0 


In Exercises 13-16, find a basis for the solution space of the 
difference equation. Prove that the solutions you find span the 
solution set. 

13 . yk -\-2 — yk+\ + \yk = o 14 . yk +2 — 5^+1 + 6yk = 0 

15 . 6y k+2 + ^+i -2y k = 0 16 . y k+2 - 25y k = 0 

Exercises 17 and 18 concern a simple model of the national 
economy described by the difference equation 

Y k+1 -a{\ + b)Y k+l + abY k = 1 (14) 

Here is the total national income during year k, a is a. constant 
less than 1, called the marginal propensity to consume, and b is 
a positive constant of adjustment that describes how changes in 
consumer spending affect the annual rate of private investment. 1 

17 . Find the general solution of equation (14) when a = .9 and 

Z? = |. What happens to as k increases? [Hint: First find a 

particular solution of the form = T, where r is a constant, 

called the equilibrium level of national income.] 

18 . Find the general solution of equation (14) when a = .9 and 
b = .5. 


1 For example, see Discrete Dynamical Systems, by James T. Sandefur 
(Oxford: Clarendon Press, 1990), pp. 267-276. The original 
accelerator-multiplier model is attributed to the economist P. A. 
Samuelson. 






252 CHAPTER 4 Vector Spaces 


A lightweight cantilevered beam is supported at N points spaced 
10 ft apart, and a weight of 500 lb is placed at the end of the 
beam, 10 ft from the first support, as in the figure. Let yk be 
the bending moment at the kth support. Then y\ = 5000 ft-lb. 
Suppose the beam is rigidly attached at the TVth support and the 
bending moment there is zero. In between, the moments satisfy 
the three-moment equation 

yk -\-2 + 4^+1 y k = 0 fork = \,2, …， N - 2 (15) 


|< —— 10 ' ———— 10 ' —— >\< —— 10 ' ——>| 

/ 5(X)T ^ ^ ^ m 



Bending moments on a cantilevered beam. 


19. Find the general solution of difference equation (15). Justify 
your answer. 

20. Find the particular solution of (15) that satisfies the boundary 
conditions y\ = 5000 and = 0. (The answer involves 
N.) 

21. When a signal is produced from a sequence of measurements 
made on a process (a chemical reaction, a flow of heat 
through a tube, a moving robot arm, etc.), the signal usually 
contains random noise produced by measurement errors. A 
standard method of preprocessing the data to reduce the noise 
is to smooth or filter the data. One simple filter is a moving 
average that replaces each by its average with the two 
adjacent values: 

|^+i + \yk + \yk-\ = Zk for A: = 1 , 2 ,… 

Suppose a signal for k = 0, •. • ， 14, is 

9, 5, 7, 3, 2, 4, 6 , 5, 7, 6 , 8 , 10, 9, 5, 7 

Use the filter to compute Zi ， ... ， Z 13 . Make a broken-line 
graph that superimposes the original signal and the smoothed 
signal. 

22. Let {_y^} be the sequence produced by sampling the continu¬ 
ous signal 2 cos ^ + cos 手 at t = 0 , 1 ， 2 , …， as shown in 
the figure. The values of y/c ， beginning with k = 0, are 

3 ， .7 ， 0, — .7, — 3, 一 .7, 0, .7, 3, .7, 0,... 

where .7 is an abbreviation for y/2/2. 

a. Compute the output signal {zk} when {^} is fed into the 
filter in Example 3. 

b. Explain how and why the output in part (a) is related to 
the calculations in Example 3. 



Exercises 23 and 24 refer to a difference equation of the form 

yk-\-i — ayic = b, for suitable constants a and b. 

23. A loan of $10,000 has an interest rate of 1% per month and a 
monthly payment of $450. The loan is made at month k = Q, 
and the first payment is made one month later, at k = l. For 
k = 0 , 1 , 2 ,, let 外 be the unpaid balance of the loan just 
after the kth monthly payment. Thus 

yi = 10,000 + (.01)10,000- 450 

New Balance Interest Payment 

balance due added 

a. Write a difference equation satisfied by {^}. 

b. [M] Create a table showing k and the balance at month 
k. List the program or the keystrokes you used to create 
the table. 

c. [M] What will k be when the last payment is made? How 
much will the last payment be? How much money did the 
borrower pay in total? 

24. At time 众 = 0, an initial investment of $1000 is made into a 
savings account that pays 6 % interest per year compounded 
monthly. (The interest rate per month is .005.) Each month 
after the initial investment, an additional $200 is added to 
the account. For A: = 0,1,2,let ^ be the amount in the 
account at time k, just after a deposit has been made. 

a. Write a difference equation satisfied by 

b. [M] Create a table showing k and the total amount in the 
savings account at month k, for k = 0 through 60. List 
your program or the keystrokes you used to create the 
table. 

c. [M] How much will be in the account after two years (that 
is, 24 months), four years, and five years? How much of 
the five-year total is interest? 

In Exercises 25-28, show that the given signal is a solution of 

the difference equation. Then find the general solution of that 

difference equation. 

25. y k = k 2 \ y k+2 + 3y k+i - 4y k = 7 10k 

26. yic = \ -\- k\ yk +2 — 6 }^+i + 5y/c = —4 

27. yk = k — 2\ yk-\-2 — ^yk = S — 3k 

2S* = 1 + 2 灸； yk-\-2 — 25 少众 = — 48/r _ 20 
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Write the difference equations in Exercises 29 and 30 as first-order 
systems, x&+i = Ax/c ，for all k. 

29 . yk+A + 3 外 +3 — 8^+2 + 6^+1 — 2yk = 0 

30. yk-\-3 — 5 外 +2 + 8 外 = 0 

31 . Is the following difference equation of order 3? Explain. 
yw + 5y k+2 + 6^+i = 0 

32 . What is the order of the following difference equation? Ex¬ 
plain your answer. 

yk-\-3 + aiy k -\. 2 + a 2 yk+i + a^y k = 0 

33 . Let y/c = k 2 and Zk = Are the signals {yic} and 

{Zk} linearly independent? Evaluate the associated Casorati 
matrix C{k) for k = 0, k = —l, and k = —2, and discuss 
your results. 

34 . Let /, g, and h be linearly independent functions defined for 
all real numbers, and construct three signals by sampling the 
values of the functions at the integers: 

Uk = f(k), v k = g(k), w k = h(k) 


Must the signals be linearly independent in §? Discuss. 

35. Let a and b be nonzero numbers. Show that the mapping T 
defined by T{yj c } = {u^}, where 

w k = yk +2 + ay k+ \ + by k 

is a linear transformation from § into S. 

36. Let K be a vector space, and let T : K —^ F be a linear 
transformation. Given z in K, suppose x p in V satisfies 
T(x p ) = z, and let u be any vector in the kernel of T. 
Show that u + satisfies the nonhomogeneous equation 
T (x) = z. 

37. Let So be the vector space of all sequences of the form 
( 少 0 , 少 1 ， J 2 , •. .)，and define linear transformations T and D 
from § 0 into S 0 by 

D(y 0 ,yi,y 2 ,...) = (0, yo,yi,yi,^-) 

Show that TD = I (the identity transformation on §o) and 
yet DT 一 /. 


SOLUTION TO PRACTICE PROBLEM 


Examine the Casorati matrix: 


C(k) 


2^+2 3 灸 +2 


3^ cos 'f 

3 k+1 cos 
jk+2 C0S (k^y 


Set k = Q and row reduce the matrix to verify that it has three pivot positions and hence 


is invertible: 


C(0) = 


"1 

0 

r 


"i 

0 

r 

2 

3 

0 

〜 

0 

3 

-2 

4 

0 

-9 


0 

0 

-13 


The Casorati matrix is invertible at A: = 0, so the signals are linearly independent. 
Since there are three signals, and the solution space H of the difference equation has 
dimension 3 (Theorem 17), the signals form a basis for H, by the Basis Theorem. 


4.9 APPLICATIONS TO MARKOV CHAINS 


The Markov chains described in this section are used as mathematical models of a 


wide variety of situations in biology, business, chemistry, engineering, physics, and 
elsewhere. In each case, the model is used to describe an experiment or measurement 
that is performed many times in the same way, where the outcome of each trial of the 
experiment will be one of several specified possible outcomes, and where the outcome 
of one trial depends only on the immediately preceding trial. 

For example, if the population of a city and its suburbs were measured each year, 
then a vector such as 


.60 

X ° = |_ .40 


⑴ 
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Suburbs To: 

.03] City 

.97 Suburbs 


could indicate that 60% of the population lives in the city and 40% in the suburbs. The 
decimals in Xq add up to 1 because they account for the entire population of the region. 
Percentages are more convenient for our purposes here than population totals. 

A vector with nonnegative entries that add up to 1 is called a probability vector. A 
stochastic matrix is a square matrix whose columns are probability vectors. A Markov 
chain is a sequence of probability vectors xo,xi,X 2 , …， together with a stochastic 
matrix P, such that 

Xl = 尸 x 0 , x 2 = Pxi, x 3 = 尸 x 2 ，... 

Thus the Markov chain is described by the first-order difference equation 
x^+i = Pxk for k = 0,1,2,... 

When a Markov chain of vectors in W 1 describes a system or a sequence of 
experiments, the entries in list, respectively, the probabilities that the system is in 
each of n possible states, or the probabilities that the outcome of the experiment is one 
of n possible outcomes. For this reason, is often called a state vector. 

EXAMPLE 1 Section 1.10 examined a model for population movement between a 
city and its suburbs. See Fig. 1. The annual migration between these two parts of the 
metropolitan region was governed by the migration matrix M : 

From: 


M = 


That is, each year 5% of the city population moves to the suburbs, and 3% of the 
suburban population moves to the city. The columns of M are probability vectors, 
so M is a stochastic matrix. Suppose the 2000 population of the region is 600,000 in 
the city and 400,000 in the suburbs. Then the initial distribution of the population in the 
region is given by Xq in (1) above. What is the distribution of the population in 2001? 
In 2002? 



City 





05 

95 



.03 





I Suburbs_ 


.97 


FIGURE 1 Annual percentage migration between city and suburbs. 


SOLUTION In Example 3 of Section 1.10, we saw that after one year, the population 
'600,000" 


vector 


400,000 


changed to 


■•95 

.03" 

"600,000" 


"582,000" 

.05 

.97 

400,000 


418,000 


ity 9505 

i—__i 
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If we divide both sides of this equation by the total population of 1 million, and use the 
fact that kMx = M(kx), we find that 


".95 

.03" 

".600" 


'.582" 

.05 

.97 _ 

_.400_ 


•418 


The vector xi 


.582 

.418 


gives the population distribution in 2001. That is, 58.2% of 


the region lived in the city and 41.8% lived in the suburbs. Similarly, the population 
distribution in 2002 is described by a vector X 2 , where 


x 2 = Mxi 


".95 

.03" 

'.582' 


".565" 

.05 

.97 _ 

•418 - 


_.435_ 


■ 


EXAMPLE 2 Suppose the voting results of a congressional election at a certain 
voting precinct are represented by a vector x in R 3 : 


% voting Democratic (D) 
% voting Republican (R) 
% voting Libertarian (L) 


Suppose we record the outcome of the congressional election every two years by a vector 
of this type and the outcome of one election depends only on the results of the preceding 
election. Then the sequence of vectors that describe the votes every two years may be 
a Markov chain. As an example of a stochastic matrix P for this chain, we take 


From: 

D R L 

".70 .10 .30 

P = .20 .80 .30 

.10 .10 .40 


To: 

D 

R 


The entries in the first column, labeled D, describe what the persons voting Democratic 
in one election will do in the next election. Here we have supposed that 70% will vote D 
again in the next election, 20% will vote R, and 10% will vote L. Similar interpretations 
hold for the other columns of P. A diagram for this matrix is shown in Fig. 2. 


•70 .80 



•40 

FIGURE 2 Voting changes from one election to the 
next. 
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If the “transition” percentages remain constant over many years from one election 
to the next, then the sequence of vectors that give the voting outcomes forms a Markov 
chain. Suppose the outcome of one election is given by 

".55" 
xo = .40 

.05 


Determine the likely outcome of the next election and the likely outcome of the election 
after that. 

SOLUTION The outcome of the next election is described by the state vector xi and 
that of the election after that by X 2 , where 



".70 

• 10 

.30" 

".55" 


".440" 

44% will vote D. 

Xi = Pxq = 

.20 

.80 

.30 

.40 

= 

•445 

44.5% will vote R. 


• 10 

.10 

.40 

.05 


.115 

11.5% will vote L. 



■•70 

.10 

.30" 

".440" 


".3870" 

38.7% will vote D. 

x 2 = Px 1 = 

.20 

.80 

.30 

.445 

= 

•4785 

47.8% will vote R. 


.10 

.10 

.40 

.115 


• 1345 

13.5% will vote L. 


To understand why Xi does indeed give the outcome of the next election, suppose 1000 
persons voted in the “first” election, with 550 voting D, 400 voting R, and 50 voting L. 
(See the percentages in xo.) In the next election, 70% of the 550 will vote D again, 10% 
of the 400 will switch from R to D, and 30% of the 50 will switch from L to D. Thus 
the total D vote will be 

•70(550) + .10(400) + .30(50) = 385 + 40 + 15 = 440 (2) 


Thus 44% of the vote next time will be for the D candidate. The calculation in (2) is 
essentially the same as that used to compute the first entry in xi. Analogous calculations 
could be made for the other entries in xi, for the entries in X 2 , and so on. ■ 


Predicting the Distant Future 

The most interesting aspect of Markov chains is the study of a chain’s long-term 
behavior. For instance, what can be said in Example 2 about the voting after many 
elections have passed (assuming that the given stochastic matrix continues to describe 
the transition percentages from one election to the next)? Or, what happens to the 
population distribution in Example 1 “in the long run ”？ Before answering these 
questions, we turn to a numerical example. 

_.5 

EXAMPLE 3 LetP = .3 

- .2 

state is described by the Markov chain x&+i = Pxk ，for k = 0, l,... What happens to 
the system as time passes? Compute the state vectors xi,..., X 15 to find out. 


.2 .3 

.8 .3 

0 .4 


and Xq 


0 
0 _ 


.Consider a system whose 
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SOLUTION 


Xl = Pxo = 

•.5 .2 

.3 .8 

.3" 

.3 

T 

0 

_ 

".5" 

.3 


.2 0 

.4 

0 


.2 


X 2 = Pxi = 

".5 .2 .3" 
.3 .8 .3 

'.5" 

.3 

_ 

".37" 

.45 


•2 0 .4 

.2 


• 18 



".5 .2 .3' 

".37" 


".329" 

x 3 = Px 2 = 

.3 .8 .3 

•45 

= 

.525 


•2 0 .4 

■ 18 


.146 


The results of further calculations are shown below, with entries rounded to four or five 
significant figures. 



.3133 


•3064 


.3032 


.3016 

x 4 = 

.5625 

, x 5 = 

.5813 

, X 6 = 

.5906 

, X 7 = 

.5953 


.1242 


.1123 


.1062 


.1031 


.3008 


.3004 


.3002 


.3001 

x 8 = 

.5977 

, x 9 = 

.5988 

, x 10 = 

.5994 

,Xll = 

.5997 


.1016 


.1008 


.1004 


.1002 


.30005 


.30002 


.30001 


.30001 

X 12 = 

.59985 

,x 13 = 

.59993 

,X 14 = 

.59996 

, x 15 = 

.59998 


.10010 


.10005 


.10002 


.10001 


These vectors seem to be approaching q = .6 . The probabilities are hardly 

_. 1 _ 

changing from one value of k to the next. Observe that the following calculation is 
exact (with no rounding error): 



".5 .2 .3" 
.3 .8 .3 

'.3" 

.6 

一 

.15 + .12+ .03 

.09 + .48 + .03 

_ 

".30" 

.60 


.2 0 .4 

.1 


.06 + 0 + .04 


.10 


When the system is in state q, there is no change in the system from one measurement 
to the next. ■ 

Steady-State Vectors 

If 尸 is a stochastic matrix, then a steady-state vector (or equilibrium vector) for P is 
a probability vector q such that 

尸 q = q 

It can be shown that every stochastic matrix has a steady-state vector. In Example 3, q 
is a steady-state vector for P. 


EXAMPLE 4 The probability vector q = 525 a steady-state vector for the 

population migration matrix M in Example 1 ， because 


Mq = 


".95 

.03" 

".375" 


'.35625+ .01875' 


".375" 

.05 

.97 _ 

.625 


.01875+ .60625 _ 


.625 _ 


=q 


■ 
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If the total population of the metropolitan region in Example 1 is 1 million, then 
q from Example 4 would correspond to having 375,000 persons in the city and 625,000 
in the suburbs. At the end of one year, the migration out of the city would be 
(.05)(375,000) = 18,750 persons, and the migration into the city from the suburbs 
would be (.03)(625,000) = 18,750 persons. As a result, the population in the city would 
remain the same. Similarly, the suburban population would be stable. 

The next example shows how to find a steady-state vector. 


EXAMPLE 5 LetP = 



.Find a steady-state vector for P. 


SOLUTION First, solve the equation Px = x. 


Px-x = 0 

Px — /x = 0 Recall from Section 1.4 that lx = x. 

(P - /)x = 0 


For P as above, 


.6 

.3' 


■ 1 0" 


"-.4 

.3" 

.4 

.7 - 


0 1 


.4 

— .3 


To find all solutions of (P — I)x = 0, row reduce the augmented matrix: 


"-.4 

.3 

O' 


'-A 

.3 

O' 


"1 -3/4 

0 " 

A 

— .3 

0 _ 


0 

0 

0 _ 


0 0 

0 _ 


. 「 3/4 

Then x\ = and X 2 is free. The general solution is xi ^ 

Next, choose a simple basis for the solution space. One obvious choice is 


3/4 

1 


but a better choice with no fractions is w 


4 


(corresponding to X 2 = 4). 


Finally, find a probability vector in the set of all solutions of Px = x. This process 
is easy, since every solution is a multiple of the solution w above. Divide w by the sum 
of its entries and obtain 

'3/7' 


q 


4/7 


As a check, compute 




"6/10 

3/10" 

"3/7" 


•18/70+ 12/70' 


'30/70" 

4/10 

7/10 _ 

_4/7_ 


12/70 +28/70 _ 


40/70 


■ 


The next theorem shows that what happened in Example 3 is typical of many 
stochastic matrices. We say that a stochastic matrix is regular if some matrix power 
P k contains only strictly positive entries. For P in Example 3, 


•37 

.26 

.33 

.45 

.70 

.45 

.18 

•04 

.22 


Since every entry in P 2 is strictly positive, P is a regular stochastic matrix. 

Also, we say that a sequence of vectors {x/c \ k = 1,2,...} converges to a vector 
q as A: ^ oo if the entries in can be made as close as desired to the corresponding 
entries in q by taking k sufficiently large. 
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THEOREM 18 


If P is an « x n regular stochastic matrix, then P has a unique steady-state vector 
q. Further, if xq is any initial state and x^ + i = Px^ for k = 0, 1,2,, then the 
Markov chain {x^} converges to q as 


oo. 


This theorem is proved in standard texts on Markov chains. The amazing part of 
the theorem is that the initial state has no effect on the long-term behavior of the Markov 
chain. You will see later (in Section 5.2) why this is true for several stochastic matrices 
studied here. 

EXAMPLE 6 In Example 2, what percentage of the voters are likely to vote for the 
Republican candidate in some election many years from now, assuming that the election 
outcomes form a Markov chain? 

SOLUTION For computations by hand, the wrong approach is to pick some initial 
vector xo and compute xi,...,for some large value of k. You have no way of 
knowing how many vectors to compute, and you cannot be sure of the limiting values 
of the entries in x^. 

The correct approach is to compute the steady-state vector and then appeal to 
Theorem 18. Given P as in Example 2, form P — / by subtracting 1 from each diagonal 
entry in P. Then row reduce the augmented matrix: 


[(p-/) 0] 


Recall from earlier work with decimals that the arithmetic is simplified by multiplying 
each row by 10. 1 


The general solution of (P — I)x = 0 is Xi = |% 3 ,X 2 = 孕私 and X 3 is free. Choosing 
X 3 = 4, we obtain a basis for the solution space whose entries are integers, and from 
this we easily find the steady-state vector whose entries sum to 1 : 



9" 


" 9/28" 


".32" 

w = 

15 

, and q = 

15/28 


.54 


4 


4/28 


.14 


The entries in q describe the distribution of votes at an election to be held many years 
from now (assuming the stochastic matrix continues to describe the changes from one 
election to the next). Thus, eventually, about 54% of the vote will be for the Republican 
candidate. ■ 


1 Warning: Don’t multiply only P by 10. Instead, multiply the augmented matrix for equation 
(P - /)x = 0 by 10. 


-3 

1 

3 

0 


1 

0 

-9/4 

0 

2 

-2 

3 

0 

〜 

0 

1 

-15/4 

0 

1 

1 

-6 

0 


0 

0 

0 

0 


000 

3 3 6 
:I. 

12 1 

-• 

3 . 2 1 
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- NUMERICAL NOTE - 

You may have noticed that if x^+i = Px^ for 众 = 0,1， • •. ， then 
x 2 = Px 1 = P(Px 0 ) = P 2 x 0 , 

and, in general, 


Xk 


P k xo 


for A: = 0 , 1 , … 


To compute a specific vector such as X 3 , fewer arithmetic operations are needed 
to compute xi, X 2 , and X 3 , rather than P 3 and P 3 xo. However, if P is small—say, 
30 x 30—the machine computation time is insignificant for both methods, and a 
command to compute P 3 Xq might be preferred because it requires fewer human 
keystrokes. 


PRACTICE PROBLEMS 


1. Suppose the residents of a metropolitan region move according to the probabilities 
in the migration matrix M in Example 1 and a resident is chosen “at random.” Then 
a state vector for a certain year may be interpreted as giving the probabilities that the 
person is a city resident or a suburban resident at that time. 


a. Suppose the person chosen is a city resident now, so that Xq = ^ . What is the 

likelihood that the person will live in the suburbs next year? 

b. What is the likelihood that the person will be living in the suburbs in two years? 


2. Let P 


.6 .2 

.4 .8 


and q 


.3 


.Is q a steady-state vector for PI 


3. What percentage of the population in Example 1 will live in the suburbs after many 
years? 


4.9 EXERCISES 

1. A small remote village receives radio broadcasts from two 
radio stations, a news station and a music station. Of the 
listeners who are tuned to the news station, 70% will remain 
listening to the news after the station break that occurs each 
half hour, while 30% will switch to the music station at the 
station break. Of the listeners who are tuned to the music 
station, 60% will switch to the news station at the station 
break, while 40% will remain listening to the music. Suppose 
everyone is listening to the news at 8:15 A.M. 

a. Give the stochastic matrix that describes how the radio 
listeners tend to change stations at each station break. 
Label the rows and columns. 

b. Give the initial state vector. 

c. What percentage of the listeners will be listening to the 
music station at 9:25 A.M. (after the station breaks at 8:30 
and 9:00 A.M.)? 

2. A laboratory animal may eat any one of three foods each day. 
Laboratory records show that if the animal chooses one food 
on one trial, it will choose the same food on the next trial 


with a probability of 60%, and it will choose the other foods 
on the next trial with equal probabilities of 20%. 

a. What is the stochastic matrix for this situation? 

b. If the animal chooses food #1 on an initial trial, what is 
the probability that it will choose food #2 on the second 
trial after the initial trial? 



3. On any given day, a student is either healthy or ill. Of 
the students who are healthy today, 95% will be healthy 
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tomorrow. Of the students who are ill today, 55% will still 
be ill tomorrow. 

a. What is the stochastic matrix for this situation? 

b. Suppose 20% of the students are ill on Monday. What 
fraction or percentage of the students are likely to be ill 
on Tuesday? On Wednesday? 

c. If a student is healthy today, what is the probability that 
he or she will be healthy two days from now? 


total population lived in California. What percentage of the 
total population would eventually live in California if the 
listed migration probabilities were to remain constant over 
many years? 

From: 

CA Rest ofU.S. To: 

".9821 .0029] California 

.0179 .9971 」 Rest ofU.S. 


4. The weather in Columbus is either good, indifferent, or bad 
on any given day. If the weather is good today, there is a 
40% chance it will be good tomorrow, a 30% chance it will 
be indifferent, and a 30% chance it will be bad. If the weather 
is indifferent today, there is a 50% chance it will be good 
tomorrow, and a 20% chance it will be indifferent. Finally, 
if the weather is bad today, there is a 30% chance it will be 
good tomorrow and a 40% chance it will be indifferent. 

a. What is the stochastic matrix for this situation? 

b. Suppose there is a 50% chance of good weather today 
and a 50% chance of indifferent weather. What are the 
chances of bad weather tomorrow? 

c. Suppose the predicted weather for Monday is 60% in¬ 
different weather and 40% bad weather. What are the 
chances for good weather on Wednesday? 


In Exercises 5-8, find the steady-state vector. 


5. 


7. 


.9 

_.7 

.2 


.5 

.5 

.1 

.8 

.1 


9. Determine if P 


10. Determine if P 


6 . 


8 . 


.4 

.6 

■•4 

0 

•6 


.8 

.2 

.5 

•5 

0 


.3 

.7 


is a regular stochastic matrix. 


is a regular stochastic matrix. 


11 . a. Find the steady-state vector for the Markov chain in 
Exercise 1. 

b. At some time late in the day, what fraction of the listeners 
will be listening to the news? 


12. Refer to Exercise 2. Which food will the animal prefer after 
many trials? 

13. a. Find the steady-state vector for the Markov chain in 

Exercise 3. 

b. What is the probability that after many days a specific 
student is ill? Does it matter if that person is ill today? 


14. Refer to Exercise 4. In the long run, how likely is it for the 
weather in Columbus to be good on a given day? 


15. [M] The Demographic Research Unit of the California State 
Department of Finance supplied data for the following mi¬ 
gration matrix, which describes the movement of the United 
States population during 1989. In 1989, about 11.7% of the 


16. [M] In Detroit, Hertz Rent A Car has a fleet of about 2000 
cars. The pattern of rental and return locations is given by 
the fractions in the table below. On a typical day, about how 
many cars will be rented or ready to rent from the downtown 
location? 

Cars Rented from: 

City Down- Metro 


rport 

town 

Airport 

Returned to: 

.90 

.01 

.09" 


City Airport 

•01 

.90 

.01 


Downtown 

•09 

.09 

.90 


Metro Airport 


17. Let 尸 be an « x w stochastic matrix. The following argument 
shows that the equation Px = x has a nontrivial solution. (In 
fact, a steady-state solution exists with nonnegative entries. 
A proof is given in some advanced texts.) Justify each 
assertion below. (Mention a theorem when appropriate.) 

a. If all the other rows of P — I are added to the bottom 
row, the result is a row of zeros. 

b. The rows of P — I are linearly dependent. 

c. The dimension of the row space of P — / is less than n. 

d. P — I has a nontrivial null space. 


18. 


Show that every 2x2 stochastic matrix has at least one 
steady-state vector. Any such matrix can be written in the 


form P = 


— a 
a 



,where a and ^ are constants 


between 0 and 1. (There are two linearly independent steady- 
state vectors if o? = 0 = 0. Otherwise, there is only one.) 


19. Let S be the l x n row matrix with a 1 in each column, 

S = [l 1 … 1] 

a. Explain why a vector x in R n is a probability vector if and 
only if its entries are nonnegative and iSx = 1. (A 1 x 1 
matrix such as the product Sx is usually written without 
the matrix bracket symbols.) 

b. Let P be an « x « stochastic matrix. Explain why 
SP = S. 

c. Let P be an n x n stochastic matrix, and let x be a 
probability vector. Show that Px is also a probability 
vector. 


20. Use Exercise 19 to show that if P is an n x « stochastic 
matrix, then so is P 2 . 
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21. [M] Examine powers of a regular stochastic matrix, 

a. Compute P k for A: = 2, 3,4, 5, when 


.3355 

.3682 

.3067 

•0389 

.2663 

.2723 

.3277 

.5451 

■ 1935 

• 1502 

• 1589 

.2395 

.2047 

.2093 

.2067 

.1765 


Display calculations to four decimal places. What hap¬ 
pens to the columns of P k as k increases? Compute the 
steady-state vector for P. 

b. Compute Q k for A: = 10,20,..., 80, when 

".97 .05 .10" 

Q = 0 .90 .05 

•03 .05 .85 

(Stability for Q k to four decimal places may require 
k = \ \6 ox more.) Compute the steady-state vector for 


Q. Conjecture what might be true for any regular stochas¬ 
tic matrix. 

c. Use Theorem 18 to explain what you found in parts (a) 
and (b). 

22. [M] Compare two methods for finding the steady-state vector 
q of a regular stochastic matrix P : (1) computing q as in 
Example 5, or (2) computing P k for some large value of k 
and using one of the columns of P k as an approximation for 
q. [The Study Guide describes a program nulbasis that almost 
automates method (1).] 

Experiment with the largest random stochastic matrices 
your matrix program will allow, and use /： = 100 or some 
other large value. For each method, describe the time you 
need to enter the keystrokes and run your program. (Some 
versions of MATLAB have commands flops and tic 
... toe that record the number of floating point operations 
and the total elapsed time MATLAB uses.) Contrast the 
advantages of each method, and state which you prefer. 


SOLUTIONS TO PRACTICE PROBLEMS 


1. a. Since 5% of the city residents will move to the suburbs within one year, there is 
a 5% chance of choosing such a person. Without further knowledge about the 
person, we say that there is a 5% chance the person will move to the suburbs. 
This fact is contained in the second entry of the state vector xi, where 



".95 

.03" 

"1" 


".95" 

Xi = Mxo = 

.05 

•97 

0 

= 

.05 


b. The likelihood that the person will be living in the suburbs after two years is 
9.6%, because 


x 2 = Mx\ 

2. The steady-state vector satisfies Px = x. Since 

尸 q 


".95 

.03" 

".95" 


".904" 

.05 

.97 _ 

.05 _ 


.096 _ 


".6 .2" 

".3" 


'.32" 

.4 .8 

_.7_ 


_.68_ 


T^q 


we conclude that q is not the steady-state vector for P . 

3. M in Example 1 is a regular stochastic matrix because its entries are all strictly 
positive. So we may use Theorem 18. We already know the steady-state vector 
from Example 4. Thus the population distribution vectors converge to 


WEB 


.375 
q = |_ .625 

Eventually 62.5% of the population will live in the suburbs. 


CHAPTER 4 SUPPLEMENTARY EXERCISES 


1. Mark each statement True or False. Justify each answer. 
(If true, cite appropriate facts or theorems. If false, explain 
why or give a counterexample that shows why the statement 
is not true in every case.) In parts (a)-(f), \i,... ,\ p are 


vectors in a nonzero finite-dimensional vector space V, and 

S = {vi., Vp}. 

a. The set of all linear combinations of Vi,..., is a vector 
space. 
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b. If {vi,... ， y^_i} spans V, then S spans V. 

c. If {vi ， ... ， y^—i} is linearly independent, then so is S. 

d. If S is linearly independent, then 5 is a basis for V. 

e. If Span S = V, then some subset of 5 is a basis for V. 

f. If dim V = p and Span S = V, then S cannot be linearly 
dependent. 

g. A plane in R 3 is a two-dimensional subspace. 

h. The nonpivot columns of a matrix are always linearly 
dependent. 

i. Row operations on a matrix A can change the linear 
dependence relations among the rows of A. 

j. Row operations on a matrix can change the null space. 

k. The rank of a matrix equals the number of nonzero rows. 

l. If mm x n matrix A is row equivalent to an echelon ma¬ 
trix U and if U has k nonzero rows, then the dimension 
of the solution space of Ax = O is m — k. 

m. If B is obtained from a matrix A by several elementary 
row operations, then rank 5 = rank 

n. The nonzero rows of a matrix A form a basis for Row A. 

o. If matrices A and B have the same reduced echelon form, 
then Row A = Row B. 

p. If 7/ is a subspace of M. 3 , then there is a 3 x 3 matrix A 
such that H = Col A. 

q. If A is m x n and rank A = m, then the linear transfor¬ 
mation x 1 -^ Ax is one-to-one. 

r. If A is tn x n and the linear transformation x Ax is 
onto, then rank A = m. 

s. A change-of-coordinates matrix is always invertible. 

t. lfB= {bi,..., b„} and C = {ci,..., c„} are bases for a 
vector space V, then the y th column of the change-of- 
coordinates matrix is the coordinate vector [c 7 ]e. 


2. Find a basis for the set of all vectors of the form 


a — 2b + 5c 
2a -h 5b — 8c 
—a — 4b + 1c 
3a b c 


.(Be careful.) 


3. Let Ui = 

"-2" 

4 

,U 2 = 

r 

2 

,b = 

~ b\ ~ 

,and 


—6 


-5 


h 



W = Span {ui, U 2 }. Find an implicit description of W; that 
is, find a set of one or more homogeneous equations that 
characterize the points of W. [Hint: When is b in Wl] 


4. Explain what is wrong with the following discussion: Let 
f(r) = 3 + t and g{t) = 3t 1 2 , and note that g(t) = tf(t). 


Then {f, g} is linearly dependent because g is a multiple of f. 


5. Consider the polynomials pj(?) = 1 + p 2 (f) = I — t, 

P 3(0 = 4 , P 4(0 = t-\-t 2 , and p 5 (0 = l2t 1 2 , and 

let H be the subspace of P 5 spanned by the set 
S = 《 Pi,P 2 ,P 3 ,P 4 ,P 5 ^ Use the method described in the 


proof of the Spanning Set Theorem (Section 4.3) to produce 
a basis for H. (Explain how to select appropriate members 
of S.) 

6. Suppose p L , p 2 , p 3 , p 4 are specific polynomials that span a 
two-dimensional subspace H of P5. Describe how one can 
find a basis for H by examining the four polynomials and 
making almost no computations. 

7. What would you have to know about the solution set of a 
homogeneous system of 18 linear equations in 20 variables 
in order to know that every associated nonhomogeneous 
equation has a solution? Discuss. 

8. Let H be an «-dimensional subspace of an «-dimensional 
vector space V. Explain why H = V. 

9. Let T : R n —^ R m be a linear transformation. 

a. What is the dimension of the range of T if 7" is a one-to- 
one mapping? Explain. 

b. What is the dimension of the kernel of T (see Section 4.2) 
if T maps W 1 onto R m ? Explain. 

10. Let 5 be a maximal linearly independent subset of a vector 
space V. That is, S has the property that if a vector not in S 
is adjoined to S, then the new set will no longer be linearly 
independent. Prove that S must be a basis for V. [Hint: What 
if S were linearly independent but not a basis of VI] 

11. Let 5 be a finite minimal spanning set of a vector space V. 
That is, S has the property that if a vector is removed from 
S, then the new set will no longer span V. Prove that S must 
be a basis for V. 

Exercises 12-17 develop properties of rank that are sometimes 
needed in applications. Assume the matrix A is m x n. 

12. Show from parts (a) and (b) that rank AS cannot exceed the 
rank of A or the rank of B. (In general, the rank of a 
product of matrices cannot exceed the rank of any factor in 
the product.) 

a. Show that if B is n x p, then rankA^ < rank ^4. [Hint: 
Explain why every vector in the column space of AB is in 
the column space of A.] 

b. Show that if B is n x p, then rankA5 < rank B. [Hint: 
Use part (a) to study rank(A5) r .] 

13. Show that if P is an invertible mx m matrix, then 
rank PA = rank ^4. [Hint: Apply Exercise 12 to PA and 

14. Show that if 2 is invertible, then rank AQ = rank A. [Hint: 
Use Exercise 13 to study rank(^2) r .] 

15. Let A be an m x n matrix, and let B be an n x p matrix 
such that AB = 0. Show that rank A + rank B < n. [Hint: 
One of the four subspaces Nul A, Col A, Nul B, and Col B is 
contained in one of the other three subspaces.] 

16. If ^4 is an m x « matrix of rank r, then a rank factorization 
of A is an equation of the form A = CR, where C is an 
m x r matrix of rank r and R is an r x n matrix of rank r. 
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Such a factorization always exists (Exercise 38 in Section 
4.6). Given any two m x n matrices A and B, use rank 
factorizations of A and B to prove that 


rank(^ + 5) < rank A + rank 召 


[Hint: Write A B 2 ls the product of two partitioned matri¬ 
ces.] 

17. A submatrix of a matrix A is any matrix that results from 
deleting some (or no) rows and/or columns of A. It can be 
shown that A has rank r if and only if A contains an invertible 
r x r submatrix and no larger square submatrix is invertible. 
Demonstrate part of this statement by explaining (a) why 
an m x « matrix A of rank r has an m x r submatrix A\ of 
rank r, and (b) why A\ has an invertible r x r submatrix A 2 . 

The concept of rank plays an important role in the design of 
engineering control systems, such as the space shuttle system 
mentioned in this chapter’s introductory example. A state-space 
model of a control system includes a difference equation of the 
form 

x 々 +i = Ax/c + Bu/c for /: = 0,1,... (1) 

where ^4 is « x n, B is n x m, {x^：} is a sequence of “state vectors” 
in R n that describe the state of the system at discrete times, and 
{nic} is a control, or input ， sequence. The pair (A, B) is said to be 

controllable if 

rank [ B AB A 2 B … A n ~ l B] = n (2) 

The matrix that appears in (2) is called the controllability matrix 
for the system. If (A, B) is controllable, then the system can be 
controlled, or driven from the state 0 to any specified state y (in 
R”）in at most n steps, simply by choosing an appropriate control 
sequence in R w . This fact is illustrated in Exercise 18 for n = A 


and m = 2. For a further discussion of controllability, see this 
text’s web site (Case Study for Chapter 4). 

WEB 


18. Suppose ^4 is a 4 x 4 matrix and 5 is a 4 x 2 matrix, and let 

Uo,..., U 3 represent a sequence of input vectors in R 2 . 

a. Set Xq = 0, compute Xi, ..., X4 from equation (1), and 
write a formula for X 4 involving the controllability matrix 
M appearing in equation (2). {Note: The matrix M is 
constructed as a partitioned matrix. Its overall size here 
is 4 x 8 .) 

b. Suppose (A, B) is controllable and y is any vector in M 4 . 
Explain why there exists a control sequence u 。， ... ， U 3 in 
R 2 such that X 4 = v. 


Determine if the matrix pairs in Exercises 19-22 are controllable. 

19. A 

20 . A 


21. [M] A 


■ .9 

1 

0" 


"0" 

0 

—.9 

0 

，B = 

1 

_ 0 

0 

•5_ 


_ 1 _ 

".8 

— .3 

0" 


"1" 

.2 

.5 

1 

，B = 

1 

0 

0 

— .5 


0 


22. [M] A 


0 

1 

0 

0 _ 


1 一 

0 

0 

1 

0 

，B = 

0 

0 

0 

0 

1 

0 

_-2 

-4.2 

-4.8 

-3.6_ 


-1 _ 

0 

1 

0 

0 " 


' 1" 

0 

0 

1 

0 

，B = 

0 

0 

0 

0 

1 

0 

-1 

—13 

一 12.2 

一 1.5 


-1 
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Eigenvalues and 
Eigenvectors 


INTRODUCTORY EXAMPLE 

Dynamical Systems and Spotted Owls 



In 1990, the northern spotted owl became the center of 
a nationwide controversy over the use and misuse of the 
majestic forests in the Pacific Northwest. Environmen¬ 
talists convinced the federal government that the owl was 
threatened with extinction if logging continued in the old- 
growth forests (with trees over 200 years old), where the 
owls prefer to live. The timber industry, anticipating 
the loss of 30,000 to 100,000 jobs as a result of new 
government restrictions on logging, argued that the owl 
should not be classified as a “threatened species” and cited 
a number of published scientific reports to support its case. 1 

Caught in the crossfire of the two lobbying groups, 
mathematical ecologists intensified their drive to under¬ 
stand the population dynamics of the spotted owl. The 
life cycle of a spotted owl divides naturally into three 
stages: juvenile (up to 1 year old), subadult (1 to 2 years), 
and adult (over 2 years). The owls mate for life during 
the subadult and adult stages, begin to breed as adults, 
and live for up to 20 years. Each owl pair requires about 
1000 hectares (4 square miles) for its own home territory. 
A critical time in the life cycle is when the juveniles leave 
the nest. To survive and become a subadult, a juvenile 
must successfully find a new home range (and usually a 
mate). 


A first step in studying the population dynamics is to 
model the population at yearly intervals, at times denoted 
by k = 0,1,2,.... Usually, one assumes that there is a 1:1 
ratio of males to females in each life stage and counts only 
the females. The population at year k can be described 
by a vector = (jk ， Sk ， aic )，where jk, Sk, and ak are the 
numbers of females in the juvenile, subadult, and adult 
stages, respectively. 

Using actual field data from demographic studies, 

R. Lamberson and co-workers considered the following 
stage-matrix model: 2 




Sk+\ 

= 

_Clk+\ _ 



0 

18 

0 


0 

0 

•71 


.33 

0 

•94 


jk 

Sk 

ak 


Here the number of new juvenile females in year k -\- \ 
is .33 times the number of adult females in year k (based 
on the average birth rate per owl pair). Also, 18% of the 
juveniles survive to become subadults, and 71% of the 
subadults and 94% of the adults survive to be counted as 
adults. 

The stage-matrix model is a difference equation of the 
form x&+i = Ax/c. Such an equation is often called a 


1 “The Great Spotted Owl War,” Reader’s Digest, November 1992, 
pp. 91-95. 


2 R. H. Lamberson, R. McKelvey, B. R. Noon, and C. Voss, “A Dynamic 
Analysis of the Viability of the Northern Spotted Owl in a Fragmented 
Forest Environment,” Conservation Biology 6 (1992), 505-512. Also, a 
private communication from Professor Lamberson, 1993. 
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dynamical system (or a discrete linear dynamical 
system) because it describes the changes in a system as 
time passes. 

The 18% juvenile survival rate in the Lamberson stage 
matrix is the entry affected most by the amount of old- 
growth forest available. Actually, 60% of the juveniles 
normally survive to leave the nest, but in the Willow 
Creek region of California studied by Lamberson and his 
colleagues, only 30% of the juveniles that left the nest were 
able to find new home ranges. The rest perished during the 
search process. 


A significant reason for the failure of owls to find new 
home ranges is the increasing fragmentation of old-growth 
timber stands due to clear-cutting of scattered areas on 
the old-growth land. When an owl leaves the protective 
canopy of the forest and crosses a clear-cut area, the risk of 
attack by predators increases dramatically. Section 5.6 will 
show that the model described above predicts the eventual 
demise of the spotted owl, but that if 50% of the juveniles 
who survive to leave the nest also find new home ranges, 
then the owl population will thrive. 


WEB 


The goal of this chapter is to dissect the action of a linear transformation x Ax into 
elements that are easily visualized. Except for a brief digression in Section 5.4, all 
matrices in the chapter are square. The main applications described here are to discrete 
dynamical systems, including the spotted owls discussed above. However, the basic 
concepts — eigenvectors and eigenvalues—are useful throughout pure and applied math¬ 
ematics, and they appear in settings far more general than we consider here. Eigenvalues 
are also used to study differential equations and continuous dynamical systems, they 
provide critical information in engineering design, and they arise naturally in fields such 
as physics and chemistry. 


5.1 EIGENVECTORS AND EIGENVALUES 

Although a transformation x\-^ Ax may move vectors in a variety of directions, it often 
happens that there are special vectors on which the action of A is quite simple. 


EXAMPLE 1 La A = 厂 J ，u = 

y under multiplication by A are shown in 


,and v 


.The images of u and 


Fig. 1. In fact, A\ is just 2v. So A only 


‘‘stretches,” or dilates, v. 


■ 


x i 



FIGURE 1 Effects of multiplication by A. 


As another example, readers of Section 4.9 will recall that if ^4 is a stochastic matrix, 
then the steady-state vector q for A satisfies the equation Ax = x. That is, Aq = 1 • q. 
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This section studies equations such as 

Ax = 2x or Ax = — 4x 

where special vectors are transformed by A into scalar multiples of themselves. 


An eigenvector of an n x n matrix ^4 is a nonzero vector x such that Ax = Ax 
for some scalar A. A scalar A is called an eigenvalue of A if there is a nontrivial 
solution x of i4x = Ax; such an x is called an eigenvector corresponding to A . 1 


It is easy to determine if a given vector is an eigenvector of a matrix. It is also easy 
to decide if a specified scalar is an eigenvalue. 



—4u, but A\ ^ Ay. 


EXAMPLE 2 LeM = 

vectors of A1 

SOLUTION 

^=[5 
, r 1 

A\ = _ 



6 

2 

6 

2 




,and y = 



Are u and v eigen- 



-24 

20 

-9" 

11 




= — 4u 


Thus u is an eigenvector corresponding to an eigenvalue (—4), but v is not an eigenvector 
of A, because A\ is not a multiple of v. ■ 


EXAMPLE 3 Show that 7 is an eigenvalue of matrix A in Example 2, and find the 
corresponding eigenvectors. 

SOLUTION The scalar 7 is an eigenvalue of A if and only if the equation 

Ax = 7x (1) 

has a nontrivial solution. But (1) is equivalent to Ax — 7x = 0, or 

(A- 7/)x = 0 (2) 


To solve this homogeneous equation, form the matrix 


A-1I = 


"1 

6 ' 


'7 0" 


'-6 6 " 

_5 

2 


_0 7_ 


5 -5_ 


The columns of A— II are obviously linearly dependent, so (2) has nontrivial solu¬ 
tions. Thus 7 is an eigenvalue of A. To find the corresponding eigenvectors, use row 
operations: 


■-6 6 

O' 


"1 -1 

0 " 

5-5 

0 


_0 0 

0 _ 


The general solution has the form X 2 
eigenvector corresponding to A = 7. 


.Each vector of this form with ^2 7 ^ 0 is an 

■ 


1 Note that an eigenvector must be nonzero, by definition, but an eigenvalue may be zero. The case in which 
the number 0 is an eigenvalue is discussed after Example 5. 
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Warning: Although row reduction was used in Example 3 to find eigenvectors, it 
cannot be used to find tigtnvalues. An echelon form of a matrix A usually does not 
display the eigenvalues of A. 

The equivalence of equations (1) and (2) obviously holds for any A in place of 
A = 7. Thus X is an eigenvalue of an n x « matrix A if and only if the equation 

(A- A/)x = 0 (3) 

has a nontrivial solution. The set of all solutions of (3) is just the null space of the matrix 
A — XI. So this set is a subspace of and is called the eigenspace of A corresponding 
to A. The eigenspace consists of the zero vector and all the eigenvectors corresponding 
to A. 

Example 3 shows that for matrix A in Example 2, the eigenspace corresponding to 
A = 7 consists of all multiples of (1,1)，which is the line through (1,1) and the origin. 
From Example 2, you can check that the eigenspace corresponding to A = —4 is the 
line through (6, 一 5). These eigenspaces are shown in Fig. 2, along with eigenvectors 
(1 ， 1) and (3/2, 一 5/4) and the geometric action of the transformation x Ax on each 
eigenspace. 


x i 





4-1 

6 

EXAMPLE 4 LqiA = 

2 1 

6 



2-1 

8 

the corresponding eigenspace. 


SOLUTION Form 





"4 -1 6" 


A-2I = 

2 

1 6 

— 


2 - 

-1 8 



and row reduce the augmented matrix for (A 

'2 -1 6 0 " 

2 -1 6 0 〜 

2-160 


An eigenvalue of ^4 is 2. Find a basis for 


0 

0 " 


"2 

-1 

6 " 

2 

0 

= 

2 

-1 

6 

0 

2 


2 

-1 

6 

2 /)x = 

0 : 




2-1 

6 

0 " 



0 

0 

0 

0 



0 

0 

0 

0 
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THEOREM 1 


At this point, it is clear that 2 is indeed an eigenvalue of A because the equation 
(A — 2/)x = 0 has free variables. The general solution is 




1/2 


-3 


X2 

= 

1 

+ X3 

0 

, X 2 and X 3 free 

X3 


0 


1 



The eigenspace, shown in Fig. 3, is a two-dimensional subspace of M 3 . A basis is 



FIGURE 3 A acts as a dilation on the eigenspace. 


i— NUMERICAL NOTE - 

Example 4 shows a good method for manual computation of eigenvectors in 
simple cases when an eigenvalue is known. Using a matrix program and row 
reduction to find an eigenspace (for a specified eigenvalue) usually works, too, 
but this is not entirely reliable. Roundoff error can lead occasionally to a reduced 
echelon form with the wrong number of pivots. The best computer programs 
compute approximations for eigenvalues and eigenvectors simultaneously, to 
any desired degree of accuracy, for matrices that are not too large. The size 
of matrices that can be analyzed increases each year as computing power and 
software improve. 


The following theorem describes one of the few special cases in which eigenvalues 
can be found precisely. Calculation of eigenvalues will also be discussed in Section 5.2. 


The eigenvalues of a triangular matrix are the entries on its main diagonal. 


PROOF For simplicity, consider the 3x3 case. If A is upper triangular, then A — XI 
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an 

012 

ai3 


"A 

0 

0 一 

A — XI = 

0 

022 

023 

- 

0 

A 

0 


0 

0 

^33 _ 


0 

0 

\ 


■an - 

A 

a 12 


^13 



= 

0 


“22 _ 入 

a 23 




0 


0 

^33 — 

A 



The scalar A is an eigenvalue of A if and only if the equation (A — A/)x = 0 has a 
nontrivial solution, that is, if and only if the equation has a free variable. Because of the 
zero entries in A — XI, it is easy to see that (A — A/)x = 0 has a free variable if and 
only if at least one of the entries on the diagonal of A — XI is zero. This happens if and 
only if X equals one of the entries an, in A. For the case in which A is lower 

triangular, see Exercise 28. ■ 



"3 6 - 8 " 


4 

0 

0 " 


EXAMPLE 5 LQtA = 

0 0 6 

and B = 

-2 

1 

0 

.The eigenval 


0 0 2 


5 

3 

4 



ues of A are 3, 0, and 2. The eigenvalues of B are 4 and 1. ■ 


What does it mean for a matrix A to have an eigenvalue of 0, such as in Example 5? 
This happens if and only if the equation 

Ax = Ox (4) 

has a nontrivial solution. But (4) is equivalent to Ax = 0, which has a nontrivial solution 
if and only if A is not invertible. Thus 0 is an eigenvalue of A if and only if A is not 
invertible. This fact will be added to the Invertible Matrix Theorem in Section 5.2. 

The following important theorem will be needed later. Its proof illustrates a typical 
calculation with eigenvectors. 

THEOREM 2 If Vl ,... , y r are eigenvectors that correspond to distinct eigenvalues Ai, ... ,X r 
of ann x n matrix A, then the set {vi,..., v r } is linearly independent. 


PROOF Suppose {vi,..., y r } is linearly dependent. Since Vi is nonzero, Theorem 7 in 
Section 1.7 says that one of the vectors in the set is a linear combination of the preceding 
vectors. Let p be the least index such that is a linear combination of the preceding 
(linearly independent) vectors. Then there exist scalars cu , c p such that 

c\\\ + ••• + &、= Vh (5) 

Multiplying both sides of (5) by A and using the fact that A\k = for each k, we 
obtain 

c\A\\ H - h c p A\ p = A\ p+ i 

c p X p \ p = ( 6 ) 

Multiplying both sides of (5) by and subtracting the result from ( 6 ), we have 

ci ( 又 i — c P {X p — Xp^y p =0 (7) 

Since {vi,..., v^} is linearly independent, the weights in (7) are all zero. But none of 
the factors A/ _ A^+i are zero, because the eigenvalues are distinct. Hence C\ = 0 for 
i = 1 , … ， p. But then (5) says that = 0, which is impossible. Hence {vi,..., y r } 
cannot be linearly dependent and therefore must be linearly independent. ■ 
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Eigenvectors and Difference Equations 

This section concludes by showing how to construct solutions of the first-order differ¬ 
ence equation discussed in the chapter introductory example: 

x k+1 = Ax k (k = 0,1,2,...) ( 8 ) 

If A is 3n n x n matrix, then (8) is a recursive description of a sequence {x^} in M w . 
A solution of ( 8 ) is an explicit description of {x/c} whose formula for each x/c does not 
depend directly on A or on the preceding terms in the sequence other than the initial 
term xo. 

The simplest way to build a solution of (8) is to take an eigenvector Xq and its 
corresponding eigenvalue X and let 

xjc = A^x 0 (k = 1,2,...) ⑼ 

This sequence is a solution because 

Ax/c = ^(A^xo) = A^(ylxo)= 久人’（又 x 0 ) = X k+l xo = x 灸 +i 

Linear combinations of solutions in the form of equation (9) are solutions, too! See 
Exercise 33. 

PRACTICE PROBLEMS 

■ 6-3 r 

1. Is 5 an eigenvalue of ^4 = 3 0 5 ? 

_2 2 6 _ 

2. If x is an eigenvector of A corresponding to A, what is A 3 xl 

3. Suppose thatbi and b 2 are eigenvectors corresponding to distinct eigenvalues X\ and 
又 2 , respectively, and suppose that b 3 and b 4 are linearly independent eigenvectors 
corresponding to a third distinct eigenvalue A 3 . Does it necessarily follow that 
{bi,b 2 ,b 3 ,b 4 } is a linearly independent set? [Hint: Consider the equation cibi + 
c 2 b 2 + (c 3 b 3 + c 4 b 4 ) = 0 .] 


5.1 EXERCISES 


1. Is A = 2 an eigenvalue of 

2. Is A = —3 an eigenvalue of 

an eigenvector of 


3 2 

3 8 


? Why or why not? 


-1 4 

6 9 


3. Is 

value. 

4. Is 


6 -4 


? Why or why not? 

? If so, find the eigen- 


an eigenvector of 


eigenvalue. 


5 2 

3 6 



3" 


'-4 

3 

3" 

5. Is 

-2 

an eigenvector of 

2 

-3 

-2 


1 


-1 

0 

-2 


the eigenvalue. 


? If so, find the 


? If so, find 


6. Is 


an eigenvector of 


eigenvalue. 

7. Is A = 4 an eigenvalue of 
corresponding eigenvector. 


6 7 

2 7 ? If so, find the 

6 4_ 

0 -1 " 

3 1 ? If so, find one 

4 5 


4-2 

8. Is A = 1 an eigenvalue of 0—1 

_-l 2 

corresponding eigenvector. 


3 ? If so, find one 
-2 


In Exercises 9-16, find a basis for the eigenspace corresponding 
to each listed eigenvalue. 
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10. A 

11 . A 

12. A 

13. A 

14. A 

15. A 

16. A . 


-4 2 

3 1 

1 -3 
-4 5 


,X = —5 
，久 =—1, 7 
，又 = 3, 7 


4 0 1 

-2 1 0 
一 2 0 1 

4 0 -1 

3 0 3 

2-2 5 


,A = 1, 2 ,； 


,A = 3 


1 

-3 : 

3 -： 

0 一 1 
3 0 

-1 3 

-2 -2 


，又: 

0一 

0 

0 

4 


,A = 4 


Find the eigenvalues of the matrices in Exercises 17 and 18. 



"0 

0 

0" 


5 

0 

0" 

17. 

0 

3 

4 

18. 

0 

0 

0 


0 

0 

-2 


-1 

0 

3 


19. For ^ 


find one eigenvalue, with no cal¬ 


culation. Justify your answer. 

20. Without calculation, find one eigenvalue and two linearly 

:2 2 2 ' 

independent eigenvectors of A 


.Justify 


your answer. 

In Exercises 21 and 22, ^4 is an « x n matrix. Mark each statement 
True or False. Justify each answer 


21. a. If Ax = Ax for some vector x, then X is an eigenvalue of 

A. 

b. A matrix A is not invertible if and only if 0 is an eigen¬ 
value of A. 

c. A number c is an eigenvalue of A if and only if the 
equation (A — cl)x = 0 has a nontrivial solution. 

d. Finding an eigenvector of A may be difficult, but check¬ 
ing whether a given vector is in fact an eigenvector is 
easy. 

e. To find the eigenvalues of A, reduce A to echelon form. 

22. a. If Ax = Ax for some scalar A, then x is an eigenvector of 

A. 

b. If Vi and V 2 are linearly independent eigenvectors, then 
they correspond to distinct eigenvalues. 


c. A steady-state vector for a stochastic matrix is actually an 
eigenvector. 

d. The eigenvalues of a matrix are on its main diagonal. 

e. An eigenspace of ^4 is a null space of a certain matrix. 

23. Explain why a 2 x 2 matrix can have at most two distinct 
eigenvalues. Explain why an n x n matrix can have at most 
n distinct eigenvalues. 

24. Construct an example of a 2 x 2 matrix with only one distinct 
eigenvalue. 

25. Let A be an eigenvalue of an invertible matrix A. Show that 
A -1 is an eigenvalue of A~ l . [Hint: Suppose a nonzero x 
satisfies Ax = Ax.] 

26. Show that if A 2 is the zero matrix, then the only eigenvalue 
of A is 0. 

27. Show that A is an eigenvalue of A if and only if A is an 
eigenvalue of A T . [Hint: Find out how A — XI and A T — XI 
are related.] 

28. Use Exercise 27 to complete the proof of Theorem 1 for the 
case in which A is lower triangular. 

29. Consider an /i x n matrix A with the property that the row 
sums all equal the same number s. Show that s is an 
eigenvalue of A. [Hint: Find an eigenvector.] 

30. Consider ann x n matrix A with the property that the column 
sums all equal the same number s. Show that s is an 
eigenvalue of A. [Hint: Use Exercises 27 and 29.] 

In Exercises 31 and 32, let A be the matrix of the linear trans¬ 
formation T. Without writing A, find an eigenvalue of A and 
describe the eigenspace. 

31. T is the transformation on R 2 that reflects points across some 
line through the origin. 

32. T is the transformation on R 3 that rotates points about some 
line through the origin. 

33. Let u and y be eigenvectors of a matrix A, with corresponding 
eigenvalues X and /x, and let C\ and C 2 be scalars. Define 

Xk = c\X k \x + C 2 [i k y (k = 0,1,2,...) 

a. What is by definition? 

b. Compute Ax^ from the formula for x^, and show that 
Ax/c = x 々 +i. This calculation will prove that the se¬ 
quence {x^ defined above satisfies the difference equa¬ 
tion X/c+i = Axk (k = 0,1,2,...). 

34. Describe how you might try to build a solution of a difference 
equation x 灸 +i = Ax^ {k = 0,1,2,...) if you were given the 
initial X。 and this vector did not happen to be an eigenvector 
of A. [Hint: How might you relate Xq to eigenvectors of Al~\ 

35. Let u and y be the vectors shown in the figure, and suppose 
u and v are eigenvectors of a 2 x 2 matrix A that correspond 
to eigenvalues 2 and 3, respectively. Let T : R 2 —^ M 2 be 
the linear transformation given by T (x) = Ax for each x in 
R 2 , and let w = u + y. Make a copy of the figure, and on 
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the same coordinate system, carefully plot the vectors r(u), 
r(v), and r(w). 


x i 



x \ 


36. Repeat Exercise 35, assuming u and y are eigenvectors of A 
that correspond to eigenvalues —1 and 3, respectively. 

[M] In Exercises 37-40, use a matrix program to find the eigen¬ 
values of the matrix. Then use the method of Example 4 with a 
row reduction routine to produce a basis for each eigenspace. 


12 

37. 2 

1 


11 



"5 

-2 

2 

-4" 

38. 

7 

-4 

2 

-4 

4 

-4 

2 

0 


3 

-1 

1 

-3 


39. 


12 

8 

16 

0 

8 


—90 

-49 

-52 

—30 

-41 


30 

15 

12 

10 

15 


30 30 

15 15 

0 20 
22 10 
15 7 



"-23 

57 

-9 

-15 

-59 


-10 

12 

-10 

2 

-22 

40. 

11 

5 

-3 

-19 

—15 


-27 

31 

-27 

25 

-37 


-5 

-15 

-5 

1 

31 


SOLUTIONS TO PRACTICE PROBLEMS 


1. The number 5 is an eigenvalue of A if and only if the equation (A — 5/)x = 0 has a 
nontrivial solution. Form 



"6 

-3 

r 


"5 

0 

0" 


"i -3 r 

A-5I = 

3 

0 

5 

— 

0 

5 

0 

= 

3-55 


2 

2 

6 


0 

0 

5 


2 2 1 


and row reduce the augmented matrix: 


"1 

-3 

1 

0 " 


"i 

-3 

1 

0" 


"1 

-3 

1 

0" 

3 

-5 

5 

0 

〜 

0 

4 

2 

0 

〜 

0 

4 

2 

0 

2 

2 

1 

0 


0 

8 

-1 

0 


0 

0 

-5 

0 


At this point, it is clear that the homogeneous system has no free variables. Thus 
A — 51 is an invertible matrix, which means that 5 is not an eigenvalue of A. 

2. If x is an eigenvector of A corresponding to A, then Ax = Ax and so 

A 2 x = ^(Ax) = XAx = A 2 x 

Again, A 3 x = A(A 2 x) = A(X 2 x) = X 2 Ax = A 3 x. The general pattern, A k x = X k x, 
is proved by induction. 

3. Yes. Suppose cibi + Qb 2 + + = 0. Since any linear combination of 

eigenvectors from the same eigenvalue is again an eigenvector for that eigenvalue, 
C 3 b 3 + is an eigenvector for A 3 . By Theorem 2, the vectors bi, b〗，and C 3 b 3 + 

are linearly independent, so 


cibi + c 2 b 2 (c 3 b 3 + c 4 b 4 ) = 0 


implies c\ = C 2 = 0. But then, C 3 and C 4 must also be zero since b 3 and b 4 are 
linearly independent. Hence all the coefficients in the original equation must be 
zero, and the vectors bi ， b 〗， b〗，and b 4 are linearly independent. 


5.2 THE CHARACTERISTIC EQUATION 

Useful information about the eigenvalues of a square matrix A is encoded in a special 
scalar equation called the characteristic equation of A. A simple example will lead to 
the general case. 
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EXAMPLE 1 Find the eigenvalues of ^4 = - ,. 

|_ 3 —6 

SOLUTION We must find all scalars A such that the matrix equation 


(A - A/)x = 0 

has a nontrivial solution. By the Invertible Matrix Theorem in Section 2.3, this problem 
is equivalent to finding all X such that the matrix A — XI is not invertible, where 


"2 3' 


"A 

O' 


"2-A 3 " 

_3 -6 


0 

A 


3 — 6 — A 


A-XI 


By Theorem 4 in Section 2.2, this matrix fails to be invertible precisely when its 
determinant is zero. So the eigenvalues of A are the solutions of the equation 


So 


det(^4 — XI) = det 


2-X 


-6-A 


0 


Recall that 


det 


ad — be 


det^4 — XI) = (2 — 久 )(—6-A)- (3)(3) 

=-12 + 6A-2A + A 2 -9 
=A 2 + 4A - 21 
=(A- 3) (A + 7) 

If det(^4 — XI) = 0, then A = 3 or A = —7. So the eigenvalues of A are 3 and —7. ■ 

The determinant in Example 1 transformed the matrix equation (^4 — XI)x = 0, 
which involves two unknowns (A and x), into the scalar equation A 2 + 4A — 21 = 0, 
which involves only one unknown. The same idea works for n x n matrices. However, 
before turning to larger matrices, we summarize the properties of determinants needed 
to study eigenvalues. 


Determinants 


Let A be an n x n matrix, let U be any echelon form obtained from A by row 
replacements and row interchanges (without scaling), and let r be the number of such 
row interchanges. Then the determinant of A, written as det^4, is (—l) r times the 
product of the diagonal entries U \\,..., u nn in U. If A is invertible, then U \\,..., u nn 
are all pivots (because A 〜 I n and the ua have not been scaled to l’s). Otherwise, at 
least u nn is zero, and the product U\\ - - - u nn is zero. Thus 1 


det ^4 = 



( product of 
pivots in U 


0 , 


when A is invertible 
when A is not invertible 


⑴ 


1 Formula (1) was derived in Section 3.2. Readers who have not studied Chapter 3 may use this formula as 
the definition of det^4. It is a remarkable and nontrivial fact that any echelon form U obtained from A 
without scaling gives the same value for det 儿 
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1 5 0 

EXAMPLE 2 Compute det^4 for ^4 = 2 4 —1 

0-2 0 

SOLUTION The following row reduction uses one row interchange: 



"1 

5 

0 " 


"1 

5 

0 " 


" 1 

5 

0 " 


乂 〜 

0 

-6 

-1 

〜 

0 

-2 

0 

〜 

0 

-2 

0 

= U\ 


0 

-2 

0 


0 

-6 

-1 


0 

0 

-1 



So det ^4 equals (― 丄乂⑴卜 2)(—1) = —2. The following alternative row reduction 
avoids the row interchange and produces a different echelon form. The last step adds 
—1/3 times row 2 to row 3: 



"1 

5 

0 " 


■1 5 

0 


^4 〜 

0 

-6 

-1 

〜 

0-6 

-1 

= u 2 


0 

-2 

0 


0 0 

1/3 _ 



This time det A is (—1)°(1)(—6)(l/3) = —2, the same as before. ■ 

Formula (1) for the determinant shows that A is invertible if and only if det ^4 is 
nonzero. This fact, and the characterization of invertibility found in Section 5.1, can be 
added to the Invertible Matrix Theorem. 


THEOREM 

又 3 



FIGURE 1 


The Invertible Matrix Theorem (continued) 

Let A be an n x n matrix. Then A is invertible if and only if: 

s. The number 0 is not an eigenvalue of A. 

t. The determinant of A is not zero. 


When ^4 is a 3 x 3 matrix, | det^4| turns out to be the volume of the parallelepiped 
determined by the columns ai, a 2 , a 3 of A, as in Fig. 1. (See Section 3.3 for details.) 
This volume is nonzero if and only if the vectors a! ， a 〗， a 〗 are linearly independent, in 
which case the matrix A is invertible. (If the vectors are nonzero and linearly dependent, 
they lie in a plane or along a line.) 

The next theorem lists facts needed from Sections 3.1 and 3.2. Part (a) is included 
here for convenient reference. 


THEOREM 3 Properties of Determinants 

Let A and B be n x n matrices. 

a. A is invertible if and only if det A ^ 0. 

b. det ^45 = (det ^4)(det B). 

c. det A T = det A. 

d. If A is triangular, then det A is the product of the entries on the main diagonal 
of A. 

e. A row replacement operation on A does not change the determinant. A row 
interchange changes the sign of the determinant. A row scaling also scales the 
determinant by the same scalar factor. 


















276 CHAPTER 5 Eigenvalues and Eigenvectors 


The Characteristic Equation 

Theorem 3(a) shows how to determine when a matrix of the form A — XI is not 
invertible. The scalar equation det(^4 — XI) = 0 is called the characteristic equation 
of A, and the argument in Example 1 justifies the following fact. 


A scalar X is an eigenvalue of an n x n matrix A if and only if X satisfies the 
characteristic equation 

det04 - A/) = 0 


EXAMPLE 3 Find the characteristic equation of 


A 


5-2 6-1 

0 3-80 

0 0 5 4 
0 0 0 1 


SOLUTION Form A — A/, and use Theorem 3(d): 


det(yl — XI) = det 


5 -又 -2 6 

0 3-A -8 

0 0 5-A 


0 0 0 
(5-A)(3-A)(5-A)(l-A) 


The characteristic equation is 


-1 

0 

4 

1 - A 


(5 — ; l ) 2 (3 — 义 )(1 一久 ）= 0 


or 

(A- 5) 2 (A - 3) (A - 1) = 0 
Expanding the product, we can also write 


A 4 - 14A 3 + 68A 2 - 130A + 75 = 0 ■ 

In Examples 1 and 3, det (A — XI) is a polynomial in A. It can be shown that if A is 
an n x /i matrix, then det (A — XI) is a polynomial of degree n called the characteristic 
polynomial of A. 

The eigenvalue 5 in Example 3 is said to have multiplicity 2 because (A — 5) occurs 
two times as a factor of the characteristic polynomial. In general, the (algebraic) 
multiplicity of an eigenvalue A is its multiplicity as a root of the characteristic equation. 


EXAMPLE 4 The characteristic polynomial of a 6 x 6 matrix is 久 6 — 4 久 5 — 12 义 4 . 
Find the eigenvalues and their multiplicities. 

SOLUTION Factor the polynomial 

A 6 - 4A S - 12A 4 = A 4 (A 2 - 4A - 12) = A 4 (A - 6)(A + 2) 

The eigenvalues are 0 (multiplicity 4), 6 (multiplicity 1), and —2 (multiplicity 1). ■ 
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We could also list the eigenvalues in Example 4 as 0,0, 0,0, 6, and —2, so that the 
eigenvalues are repeated according to their multiplicities. 

Because the characteristic equation for an « x « matrix involves an nth-degree 
polynomial, the equation has exactly n roots, counting multiplicities, provided complex 
roots are allowed. Such complex roots, called complex eigenvalues, will be discussed 
in Section 5.5. Until then, we consider only real eigenvalues, and scalars will continue 
to be real numbers. 

The characteristic equation is important for theoretical purposes. In practical 
work, however, eigenvalues of any matrix larger than 2x2 should be found by a 
computer, unless the matrix is triangular or has other special properties. Although a 
3x3 characteristic polynomial is easy to compute by hand, factoring it can be difficult 
Factoring a (unless the matrix is carefully chosen). See the Numerical Notes at the end of this 

Polynomial 5-8 section. 

Similarity 

The next theorem illustrates one use of the characteristic polynomial, and it provides 
the foundation for several iterative methods that approximate eigenvalues. If A and 
B are n x n matrices, then A is similar to B if there is an invertible matrix P 
such that P~ l AP = B, or, equivalently, A = PBP~ l . Writing Q for P~ l , we have 
Q~ l BQ = A. So B is also similar to A, and we say simply that A and B are similar. 
Changing A into P~ l AP is called a similarity transformation. 


THEOREM 4 If n x n matrices A and B are similar, then they have the same characteristic 
polynomial and hence the same eigenvalues (with the same multiplicities). 


PROOF If B = P- 1 AP, then 

B -XI = P~ l AP-XP~ l P = P~\AP-XP) = P-\A-U)P 

Using the multiplicative property (b) in Theorem 3, we compute 

det(5 - XI) = dQt[P~ l (A- XI)P] 

= det(P _1 ) • det (火一 XI)- det(P) (2) 

Since det(P _1 ) - det(P) = det ( 尸 — 1 尸） =det/ = 1, we see from equation (2) that 
det(5 - XI) = det04 - XI). ■ 


WARNINGS: 

1. The matrices 




0 

2 


are not similar even though they have the same eigenvalues. 

2. Similarity is not the same as row equivalence. (If A is row equivalent to B, 
then B = EA for some invertible matrix E.) Row operations on a matrix 
usually change its eigenvalues. 
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Application to Dynamical Systems 

Eigenvalues and eigenvectors hold the key to the discrete evolution of a dynamical 
system, as mentioned in the chapter introduction. 


EXAMPLE 5 Let ^ 


•95 .03 

.05 .97 


.Analyze the long-term behavior of the dy¬ 


namical system defined by x^ + i = Ax^ (k = 0,1,2,...), with xq 


SOLUTION The first step is to find the eigenvalues of A and a basis for each eigenspace. 
The characteristic equation for A is 

".95-A .03 ^ 

.05 .97-A 


0 = det 


(.95 - A)(.97 - A) - (.03)(.05) 


=A 2 - 1.92A + .92 
By the quadratic formula 

1.92 士 7(1.92) 2 -4(.92) 1.92=b V.0064 


A 


1.92 士 .08 
2 


1 or .92 


It is readily checked that eigenvectors corresponding toA = 1 and X = .92 are multiples 
of 


Vl 


and \2 


-1 


respectively. 

The next step is to write the given xo in terms of Vi and m-i. This can be done because 
{vi, V 2 } is obviously a basis for M 2 . (Why?) So there exist weights C\ and c *2 such that 


x 0 = C\\i + C 2 \2 = [Vi v 2 ] 


C\ 

Cl 


(3) 


In fact, 


c\ 

Cl 


[vi v 2 ] _ x 0 


"3 

r 

-1 

".60" 

_5 - 

-1 


_.40_ 


"-1 -1" 

".60" 


".125" 

_-5 3_ 

•40 


_.225_ 


-8 

Because Vi and \2 in (3) are eigenvectors of A, with A\\ = Vi and Ay 2 : 
easily compute each : 

xi = Axo = C\A\\ -|- ciAy2 

= C\\i + C2(.92)V2 Vi and y 2 are eigenvectors. 

x 2 = Axi = c\A\\ + c 2 (.92)A\ 2 
= C 1 V 1 + c 2 (.92) 2 v 2 

and so on. In general, 

= c\\i + c 2 (.92) k \ 2 (k = 0,1,2 ,...) 

Using ci and C 2 from (4), 


⑷ 

.92v 2, we 


= .125 


.225(.92 / 


(k = 0,1,2 ,...) 


(5) 
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This explicit formula for gives the solution of the difference equation = Ax/c. 


As A: —> oo, {.92) k tends to zero and Xk tends to 


.375 

.625 


=.125vi. 


■ 


The calculations in Example 5 have an interesting application to a Markov chain 
discussed in Section 4.9. Those who read that section may recognize that matrix A 
in Example 5 above is the same as the migration matrix M in Section 4.9, xo is the 
initial population distribution between city and suburbs, and represents the population 
distribution after k years. 

Theorem 18 in Section 4.9 stated that for a matrix such as A, the sequence tends 
to a steady-state vector. Now we know why the behave this way, at least for the 
migration matrix. The steady-state vector is .125vi, a multiple of the eigenvector Vi, 
and formula (5) for Xk shows precisely why ^ .125vi. 


r— NUMERICAL NOTES - 

1. Computer software such as Mathematica and Maple can use symbolic calcu¬ 
lations to find the characteristic polynomial of a moderate-sized matrix. But 
there is no formula or finite algorithm to solve the characteristic equation of a 
general n x n matrix for n > 5. 

2. The best numerical methods for finding eigenvalues avoid the characteristic 
polynomial entirely. In fact, MATLAB finds the characteristic polynomial 
of a matrix A by first computing the eigenvalues Ai,..., of A and then 
expanding the product (A — Ai)(A — 久 2 ) • • • (A — X n ). 

3. Several common algorithms for estimating the eigenvalues of a matrix A 
are based on Theorem 4. The powerful QR algorithm is discussed in the 
exercises. Another technique, called Jacobi’s method ， works when A = A T 
and computes a sequence of matrices of the form 

A\ = A and Ak-\-\ = P^ x AkPk (k = 1,2,...) 

Each matrix in the sequence is similar to A and so has the same eigenvalues 
as A. The nondiagonal entries of A^x tend to zero as k increases, and the 
diagonal entries tend to approach the eigenvalues of A. 

4. Other methods of estimating eigenvalues are discussed in Section 5.8. 


PRACTICE PROBLEM 


Find the characteristic equation and eigenvalues of ^4 = 


-4 

2 


5.2 EXERCISES 


Find the characteristic polynomial and the real eigenvalues of the 
matrices in Exercises 1-8. 


6 . 




8 . 


3 . 



Exercises 9-14 require techniques from Section 3.1. Find the 
characteristic polynomial of each matrix, using either a cofactor 
expansion or the special formula for 3x3 determinants described 
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18. It can be shown that the algebraic multiplicity of an eigen¬ 
value A is always greater than or equal to the dimension of the 
eigenspace corresponding to A. Find h in the matrix A below 
such that the eigenspace for A = 4 is two-dimensional : 


17. 


3 0 0 0 

-5100 
3 8 0 0 

0-721 
-4 1 9 -2 


For the matrices in Exercises 15-17，list the real eigenvalues, 
repeated according to their multiplicities. 


A = 


A widely used method for estimating eigenvalues of a general 
matrix A is the QR algorithm. Under suitable conditions, this al¬ 
gorithm produces a sequence of matrices, all similar to A, that be¬ 
come almost upper triangular, with diagonal entries that approach 
the eigenvalues of A. The main idea is to factor A (or another 
matrix similar to ^4) in the form A = Q\R\, where Q\ = Q^~ l 
and R\ is upper triangular. The factors are interchanged to form 
Ai = RiQi, which is again factored as ^4i = Q 2 R 2 ', then to form 
A 2 = R 2 Q 2 , and so on. The similarity of A, A\,... follows from 
the more general result in Exercise 23. 

23. Show that if ^4 = QR with Q invertible, then A is similar to 
Ai = RQ. 

24. Show that if A and B are similar, then det A = det B. 

25. Let A = 

A is the stochastic matrix studied in Example 5 in Sec¬ 
tion 4.9.] 

a. Find a basis for R 2 consisting of Vi and another eigenvec¬ 
tor \2 of A. 

b. Verify that xo may be written in the form xo = Vi + CV 2 . 

c. For k = 1,2,... , define xa ； = ^Xo. Compute Xi and X 2 , 
and write a formula for x^. Then show that \i as k 
increases. 


Vl 


3/7 

4/7 


and Xq : 


[Note: 


19. Let ^4 be an « x n matrix, and suppose A has n real eigenval¬ 
ues, Ai, ..., repeated according to multiplicities, so that 

det (^4 — A /)=( 又 i — A) (A 2 — A) • • • (X n — A) 

Explain why det 乂 is the product of the n eigenvalues of 
A. (This result is true for any square matrix when complex 
eigenvalues are considered.) 

20. Use a property of determinants to show that A and A T have 
the same characteristic polynomial. 

In Exercises 21 and 22, A and B are n x n matrices. Mark each 
statement True or False. Justify each answer. 

21 . a. The determinant of A is the product of the diagonal entries 

in A. 

b. An elementary row operation on A does not change the 
determinant. 

c. (det ^4) (det B) = det AS 

d. If A + 5 is a factor of the characteristic polynomial of A, 
then 5 is an eigenvalue of A. 


26. Let A = r . Use formula (1) for a determinant 

c a 

(given before Example 2) to show that det ^4 = ad — be. 
Consider two cases: a _ 0 and a = 0. 


27. Let A = 

~ .5 .2 .3" 

•3 .8 .3 

,Vi = 

".3" 

.6 

,v 2 = 

1 " 

-3 


.2 0 .4 


_.1_ 


2 



"-1 " 


_ r 

V3 = 

0 

,and w = 

1 


1 


1 


a. Show that Vi, ¥ 2 , V 3 are eigenvectors of A. [Note: A is the 
stochastic matrix studied in Example 3 of Section 4.9.] 

b. Let xo be any vector in R 3 with nonnegative entries whose 
sum is 1. (In Section 4.9, Xq was called a probability 
vector.) Explain why there are constants Ci, C 2 , C 3 such 
that Xo = c\\\ + C 2\2 + C 3 V 3 . Compute w r xo, and de¬ 
duce that Ci = 1. 

c. For A: = 1,2,, define = A k Xo, with Xq as in part 

(b). Show that 2 ls k increases. 


prior to Exercises 15-18 in Section 3.1. [Note: Finding the 
characteristic polynomial of a 3 x 3 matrix is not easy to do with 
just row operations, because the variable A is involved.] 


22. a. If ^4 is 3 x 3, with columns ai, a 2 , a 3 , then det A equals 
the volume of the parallelepiped determined by a!, a 2 , a〗. 

b. det^4 r = (—1) det A. 

c. The multiplicity of a root r of the characteristic equation 
of A is called the algebraic multiplicity of r as an eigen¬ 
value of A. 

d. A row replacement operation on A does not change the 
eigenvalues. 


4 2 3 3 

0 2 h 3 

0 0 4 14 

0 0 0 2 


9. 


11 . 


13. 


4 

0 

-1 



3 

1 

1 

0 

4 

-1 


10. 

0 

5 

0 

_ 1 

0 

2_ 



_-2 

0 

7_ 

"3 

0 

0" 



"-1 

0 

2" 

2 

1 

4 


12. 

3 

1 

0 

_ 1 

0 

4_ 



0 

1 

2_ 

6 

-2 

0 



4 

0 

-1 " 

-2 

9 

0 

14. 

-1 

0 

4 

5 

8 

3 



0 

2 

3 



3 

0 

0 

0 

16. 

6 

2 

0 

0 

0 

3 

6 

0 


2 

3 

3 

-5 


> 0 " 

> 0 
» 0 
0 

3_ 

the algebraic multiplicity of an eigen- 


2 6 2 5 
- 

0 3 3 0 
I 

5 2 0 0 

5 0 0 0 

I_ 

5 . 
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28. [M] Construct a random integer-valued 4x4 matrix A, and 
verify that A and A T have the same characteristic polynomial 
(the same eigenvalues with the same multiplicities). Do A 
and A t have the same eigenvectors? Make the same analysis 
of a 5 x 5 matrix. Report the matrices and your conclusions. 

29. [M] Construct a random integer-valued 4x4 matrix A. 

a. Reduce A to echelon form U with no row scaling, and use 
U in formula (1) (before Example 2) to compute det A. (If 
A happens to be singular, start over with a new random 
matrix.) 

b. Compute the eigenvalues of A and the product of these 
eigenvalues (as accurately as possible). 


.For each value of a in 


c. List the matrix A, and, to four decimal places, list the 
pivots in U and the eigenvalues of A. Compute det ^4 with 
your matrix program, and compare it with the products 
you found in (a) and (b). 

"-6 28 21 ■ 

30. [M] Let A = 4 —15 -12 

_-8 a 25- 

the set {32,31.9,31.8,32.1, 32.2}, compute the characteris¬ 
tic polynomial of A and the eigenvalues. In each case, create 
a graph of the characteristic polynomial p(t) = det (A — tl) 
for 0 < f < 3. If possible, construct all graphs on one coor¬ 
dinate system. Describe how the graphs reveal the changes 
in the eigenvalues as a changes. 


SOLUTION TO PRACTICE PROBLEM 


The characteristic equation is 

r i - a -4 * 

0 = det(^4 — XI) = det ^ 2 又 

=(l-A)(2-A)-(-4)(4) = A 2 -3A+ 18 
From the quadratic formula, 

3 士 


A 


3 士 V(-3)2-4(18) 


2 2 

It is clear that the characteristic equation has no real solutions, so A has no real 
eigenvalues. The matrix A is acting on the real vector space R 2 , and there is no nonzero 
vector v in M 2 such that A\ = X\ for some scalar A. 


5.3 DIAGONALIZATION 


In many cases, the eigenvalue-eigenvector information contained within a matrix A can 
be displayed in a useful factorization of the form A = PDP~ l where D is sl diagonal 
matrix. In this section, the factorization enables us to compute A k quickly for large 
values of k, a fundamental idea in several applications of linear algebra. Later, in 
Sections 5.6 and 5.7, the factorization will be used to analyze (and decouple) dynamical 
systems. 

The following example illustrates that powers of a diagonal matrix are easy to 
compute. 


EXAMPLE 1 If D 

and 


"5 O' 


"5 0" 

"5 0" 


■5 2 

0 " 

_0 3 

,then D 2 = 

_0 3_ 

_0 3 

= 

0 

3 2 _ 


D 5 


DD Z 


"5 0" 

'5 2 O' 


'5 3 0 _ 

_0 3_ 

0 3 2 


0 3 3 


In general, 


D k 


5 k 0 
0 3 k 


fork > 


■ 


If ^4 = PDP~ l for some invertible P and diagonal D, then A k is also easy to 
compute, as the next example shows. 
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THEOREM 5 


EXAMPLE 2 

where 


Let A = 



.Find a formula for A k , given that A = PDP~ l , 



and 



0 


SOLUTION The standard formula for the inverse of a 2 x 2 matrix yields 



Then, by associativity of matrix multiplication, 

A 2 = {PDP~ l ){PDP~ l ) = PD {P~ X P) DP~ l = PDDP - 1 



=PD 2 P~ l 


1 r 

_ 5 2 

0 " 

2 

r 

-1 -2 

_ 0 

3 2 _ 

-1 - 

-i 


Again, 


A 3 = (PDP-^A 2 = {PDP- l )PD 2 P~ x = PDD 2 P~ l = PD 3 P -1 



In general, for k > \, 


A k = PD k P - 1 


i 

r 

■5 k 

0 _ 

2 

r 

-i 

-2 

_ 0 

3 k _ 

-1 - 

-i 


2 • 5 k - 3 k 5 k - 3 k 
2 • 3* - 2 • 5^ 2 ■ 3 k - 5 k 


■ 


A square matrix A is said to be diagonalizable if A is similar to a diagonal matrix, 
that is, if ^4 = PDP~ l for some invertible matrix P and some diagonal matrix D • 
The next theorem gives a characterization of diagonalizable matrices and tells how to 
construct a suitable factorization. 


The Diagonalization Theorem 

An n x n matrix A is diagonalizable if and only if A has n linearly independent 
eigenvectors. 

In fact, A = PDP~ l , with D a diagonal matrix, if and only if the columns of 
P are n linearly independent eigenvectors of A. In this case, the diagonal entries 
of D are eigenvalues of A that correspond, respectively, to the eigenvectors in P. 


In other words, A is diagonalizable if and only if there are enough eigenvectors to 
form a basis of . We call such a basis an eigenvector basis of R w . 


PROOF First, observe that if P is any n x n matrix with columns Vi,..., y„, and if D 
is any diagonal matrix with diagonal entries Ai,..., X n , then 

AP = A[\\ \ 2 ••- v„ ] = [^vi A\ 2 --- A\ n ] (1) 


while 


X\ 0 • • • 
0 A 2 •. • 

PD = P 


0 

0 

=[AiV! A 2 v 2 --- A„v„ ] 


0 0 • • • X n 


⑵ 
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Now suppose A is diagonalizable and A = PDP~ l . Then right-multiplying this relation 
by P, we have AP = PD. In this case, equations (1) and (2) imply that 


[ A\\ A\ 2 

• ]= [入 lVl 义 2V2 … 

] 

(3) 

Equating columns, we find that 




A\\ = Aivi, 

A\2 = A 2 V2, • •., A\ n = 

= 久 

⑷ 


Since P is invertible, its columns Vi,..., y„ must be linearly independent. Also, since 
these columns are nonzero, the equations in (4) show that Ai,..., are eigenvalues and 
Vi,..., y 71 are corresponding eigenvectors. This argument proves the “only if’ parts of 
the first and second statements, along with the third statement, of the theorem. 

Finally, given any n eigenvectors Vi, …， use them to construct the columns of 
P and use corresponding eigenvalues X\,... ,X n to construct D. By equations (1)-(3), 
AP = PD. This is true without any condition on the eigenvectors. If, in fact, the 
eigenvectors are linearly independent, then P is invertible (by the Invertible Matrix 
Theorem), and AP = PD implies that A = PDF -1 . ■ 

Diagonalizing Matrices 

EXAMPLE 3 Diagonalize the following matrix, if possible. 

"13 3" 

A = -3-5-3 

3 3 1_ 

That is, find an invertible matrix P and a diagonal matrix D such that A = PDP— 1 . 


SOLUTION There are four steps to implement the description in Theorem 5. 

Step 1. Find the eigenvalues of A. As mentioned in Section 5.2, the mechanics of this 
step are appropriate for a computer when the matrix is larger than 2x2. To avoid 
unnecessary distractions, the text will usually supply information needed for this step. 
In the present case, the characteristic equation turns out to involve a cubic polynomial 
that can be factored: 


0 = det (A - XI) = -A 3 - 3A 2 + 4 

=_ (久 _ 1)( 久 + 2) 2 


The eigenvalues are A = 1 and A = —2. 

Step 2. Find three linearly independent eigenvectors of A. Three vectors are needed 
because ^4 is a 3 x 3 matrix. This is the critical step. If it fails, then Theorem 5 says 
that A cannot be diagonalized. The method in Section 5.1 produces a basis for each 
eigenspace: 

"r 

Basis for X =1: Vi = —1 


Basis for A = —2: 

V 2 = 

"- 1 " 

1 

and V 3 = 

"- 1 " 

0 



0 


1 


You can check that {vi,V 2 , V 3 } is a linearly independent set. 
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PD = 


AP 


1 -1 -1 
-1 1 0 

1 0 1 


It is a good idea to check that P and D really work. To avoid computing P 
simply verify that AP = PD. This is equivalent to 
(However, be sure that P is invertible!) Compute 


Step 4. Construct D from the corresponding eigenvalues. In this step, it is essen¬ 
tial that the order of the eigenvalues matches the order chosen for the columns of P. 
Use the eigenvalue A = —2 twice, once for each of the eigenvectors corresponding to 
A = -2: 

'1 0 O' 

D = 0-2 0 

0 0-2 


EXAMPLE 4 Diagonalize the following matrix, if possible. 

" 2 4 3" 

A = -4-6-3 

3 3 1_ 

SOLUTION The characteristic equation of A turns out to be exactly the same as that in 
Example 3: 

0 = det (A - XI) = -A 3 - 3 义 2 + 4 = -(A - 1)(A + 2) 2 

The eigenvalues are A = 1 and A = —2. However, it is easy to verify that each 
eigenspace is only one-dimensional: 

'r 

Basis for A =1: Vi = —1 


Basis for A = —2: \2 = 1 

0_ 

There are no other eigenvalues, and every eigenvector of ^4 is a multiple of either vi 
or V 2 . Hence it is impossible to construct a basis of M 3 using eigenvectors of A. By 
Theorem 5, A is not diagonalizable. ■ 

The following theorem provides a sufficient condition for a matrix to be 
diagonalizable. 

THEOREM 6 An n 乂 n matrix with n distinct eigenvalues is diagonalizable. 


Step 3. Construct P from the vectors in step 2. The order of the vectors is unimportant. 
Using the order chosen in step 2, form 


=PDP when P is invertible. 



1 2 

2 " 

= 

-1 -2 

0 


1 0 

-2 


1 

3 

3 

-3 

-5 

-3 

3 

3 

1 

1 

-1 

-ll 

-1 

1 

0 

1 

0 

1 


匚 


2 0 2 
I 

2 2 0 



0 0 2 
I 

0 2 0 
I 

loo 
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THEOREM 7 


PROOF Let vi， …， be eigenvectors corresponding to the n distinct eigenvalues of 
a matrix A. Then {vi，... ， is linearly independent, by Theorem 2 in Section 5.1. 
Hence A is diagonalizable, by Theorem 5. ■ 

It is not necessary for m n x n matrix to have n distinct eigenvalues in order to 
be diagonalizable. The 3x3 matrix in Example 3 is diagonalizable even though it has 
only two distinct eigenvalues. 

EXAMPLE 5 Determine if the following matrix is diagonalizable. 

"5 -8 r 

^=007 
_0 0 - 2 _ 

SOLUTION This is easy! Since the matrix is triangular, its eigenvalues are obviously 5, 
0, and —2. Since J is a 3 x 3 matrix with three distinct eigenvalues, A is diagonalizable. 


Matrices Whose Eigenvalues Are Not Distinct 

If ann x n matrix A has « distinct eigenvalues, with corresponding eigenvectors Vi,..., 
\ n , and if P = [vi • • • \ n ], then P is automatically invertible because its columns 
are linearly independent, by Theorem 2. When A is diagonalizable but has fewer than n 
distinct eigenvalues, it is still possible to build 尸 in a way that makes P automatically 
invertible, as the next theorem shows. 1 


Let A be an n x n matrix whose distinct eigenvalues are Ai,..., X p . 

a. For I < k < p, the dimension of the eigenspace for is less than or equal to 
the multiplicity of the eigenvalue . 

b. The matrix A is diagonalizable if and only if the sum of the dimensions of 
the eigenspaces equals n, and this happens if and only if (/) the characteristic 
polynomial factors completely into linear factors and (ii) the dimension of the 
eigenspace for each Xk equals the multiplicity of 

c. If A is diagonalizable and Bk is a basis for the eigenspace corresponding to Xk 
for each k, then the total collection of vectors in the sets B\,... ,B P forms an 
eigenvector basis for 


EXAMPLE 6 Diagonalize the following matrix, if possible. 


A = 


5 

0 

1 

-1 


0 

5 

4 

-2 


0 

0 

-3 

0 


0 

0 

0 

-3 


! The proof of Theorem 7 is somewhat lengthy but not difficult. For instance, see S. Friedberg, A. Insel, and 
L. Spence, Linear Algebra, 4th ed. (Englewood Cliffs, NJ: Prentice-Hall, 2002), Section 5.2. 
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In Exercises 5 and 6 , the matrix A is factored in the form PDP~ l . 
Use the Diagonalization Theorem to find the eigenvalues of A and 
a basis for each eigenspace. 


Diagonalize the matrices m Exercises 7-20, if possible. The real 
eigenvalues for Exercises 11-16 and 18 are included below the 
matrix. 


7. 


8 . 


P 


2. P 


5 

7 

,D = 

2 

0 

_2 

3_ 

_0 

1 

"1 
_2 

2 ' 

3_ 

,D = 

'1 

0 

o' 

3_ 


-1 -1 


1 -1 

0 " 

"3 

0 

0 " 

0 

-1 

-1 " 

1 1 

-1 

0 

2 

0 

-1 

-1 

-1 

0 -1 

1 

0 

0 

3 

-1 

-1 

0 


In Exercises 3 and 4, use the factorization A = PDP~ l to com¬ 
pute A k , where k represents an arbitrary positive integer. 


3. 


6 . A 


3 0 

-3 4 

0 0 


a 

O' 


■ 1 

0 " 

a 0 

1 

O' 


"3 0 

-1 _ 

一 3 

0 

0 一 

0 

0 

1 " 

2{a — b) 

b _ 


_2 

1 

_0 b ■ 

_-2 

1 

= 

0 1 

-3 

0 

4 

0 

-3 

1 

9 








1 0 

0 

0 

0 

3 

-1 

0 

3 


In Exercises 1 and 2, let A = PDP~ l and compute A 4 . 


SOLUTION Since ^4 is a triangular matrix，the eigenvalues are 5 and —3, each with 
multiplicity 2. Using the method in Section 5.1，we find a basis for each eigenspace. 



'- 8 ' 


"-16" 

Basis for A = 5: vi = 

4 

1 

and \2 = 

4 

0 


0 


1 



" 0 " 


"O' 

Basis for A = —3: V 3 = 

0 

1 

and V 4 = 

0 

0 


0 


1 


The set {vi,...,¥ 4 } is linearly independent, by Theorem 7. So the matrix P 
[vi … V 4 ] is invertible, and A = PDP~ l , where 


P 


■-8 

-16 

0 

0 " 


"5 

0 

0 

0 " 

4 

4 

0 

0 

and D = 

0 

5 

0 

0 

1 

0 

1 

0 

0 

0 

-3 

0 

0 

1 

0 

1 


0 

0 

0 

-3 


■ 


WEB 


PRACTICE PROBLEMS 


Compute A s , where A 


2. Let A 


12 


,Vi 


,and \2 


2 


.Suppose you are told that vi and 


\2 are eigenvectors of A. Use this information to diagonalize A. 

3. Let ^4 be a 4 x 4 matrix with eigenvalues 5, 3, and —2, and suppose you know that 
the eigenspace for A = 3 is two-dimensional. Do you have enough information to 
determine if A is diagonalizable? 


5.3 EXERCISES 


2 3 

1 2 



0 2 
I 

3 0 


2 1 
_ I 

3 2 
II 

- 

- 6-6 
11 








































































5.3 Diagonalization 287 


"0 

1 

r 



"3 

1 

1 ' 

2 

1 

2 


12 . 

1 

3 

1 

_3 

3 

2 _ 


_ 1 

1 

3_ 

A = 

- 1 , 

5 



A 

= 2 , 

5 

2 

2 

-1 

- 


"2 

0 

- 2 " 

1 

3 

-1 


14 . 

1 

3 

2 

1 

-2 

2 


_0 

0 

3_ 

A 

= 1 , 

5 



A 

= 2 , 

3 

0 

-1 

-1 

- 


" 1 

2 

-3" 

1 

2 

1 


16 . 

2 

5 

-2 

-1 

-1 

0 _ 

1 

3 

1 


A = 0, 1 


18. 


A = 0 

"2 -2 - 2 ' 
3 -3 -2 
_2 -2 -2 _ 
A = -2,-1,0 


"5 

-3 

0 

9" 


"3 

0 

0 

0 " 

0 

3 

1 

-2 

20 . 

0 

2 

0 

0 

0 

0 

2 

0 

0 

0 

2 

0 

0 

0 

0 

2 


1 

0 

0 

3 


9. 


11 . 


13. 


15. 


17. 


19. 


In Exercises 21 and 22, A, B, P ， and D are n y. n matrices. 
Mark each statement True or False. Justify each answer. (Study 
Theorems 5 and 6 and the examples in this section carefully before 
you try these exercises.) 

21 . a. A is diagonalizable ifA = PDP~ l for some matrix D and 

some invertible matrix P. 

b. If W l has a basis of eigenvectors of A, then A is diago¬ 
nalizable. 

c. A is diagonalizable if and only if A has n eigenvalues, 
counting multiplicities. 

d. If ^4 is diagonalizable, then A is invertible. 

22. a. A is diagonalizable if A has n eigenvectors. 

b. If A is diagonalizable, then A has n distinct eigenvalues. 

c. If AP = PD, with D diagonal, then the nonzero columns 
of P must be eigenvectors of A. 

d. If 乂 is invertible, then A is diagonalizable. 

23. ^4 is a 5 x 5 matrix with two eigenvalues. One eigenspace 
is three-dimensional, and the other eigenspace is two- 
dimensional. Is A diagonalizable? Why? 

24. ^4 is a 3 x 3 matrix with two eigenvalues. Each eigenspace 
is one-dimensional. Is A diagonalizable? Why? 


25. is a 4 x 4 matrix with three eigenvalues. One eigenspace 
is one-dimensional, and one of the other eigenspaces is two- 
dimensional. Is it possible that A is not diagonalizable? 
Justify your answer. 

26. 乂 is a 7 x 7 matrix with three eigenvalues. One eigenspace is 
two-dimensional, and one of the other eigenspaces is three- 
dimensional. Is it possible that A is not diagonalizable? 
Justify your answer. 

27. Show that if A is both diagonalizable and invertible, then so 
is A~ l . 

28. Show that if A has n linearly independent eigenvectors, then 
so does A T . [Hint: Use the Diagonalization Theorem.] 

29. A factorization A = PDP~ l is not unique. Demonstrate this 

. . . ' 3 O' 

for the matrix A in Example 2. With D\ 


0 


5 


the information in Example 2 to find a matrix P\ such that 
A = PiDiP- 1 . 

30. With A and D as in Example 2, find an invertible P 2 unequal 
to the P in Example 2, such that A = P 2 DP^ { . 

31. Construct a nonzero 2x2 matrix that is invertible but not 
diagonalizable. 

32. Construct a nondiagonal 2x2 matrix that is diagonalizable 
but not invertible. 

[M] Diagonalize the matrices in Exercises 33-36. Use your 
matrix program’s eigenvalue command to find the eigenvalues, 
and then compute bases for the eigenspaces as in Section 5.1. 


33. 


34. 


35. 


36. 


13 

-12 

9 

-15 

9 

6 

-5 

9 

-15 

9 

6 

-12 

-5 

6 

9 

6 

-12 

9 

一 8 

9 

—6 

12 

12 

-6 

-2 

24 

一 6 

2 

6 

2 

72 

51 

9 

—99 

9 

0 

-63 

15 

63 

63 

72 

15 

9 

-63 

9 

0 

63 

21 

-63 

-27 


9 

-4 

-2 

-4 

56 

32 

-28 

44 

14 

-14 

6 

-14 

42 

-33 

21 

-45 


4 一 9 -7 8 2 


7 

-9 

0 

7 

14 

5 

10 

5 

-5 

-10 

•2 

3 

7 

0 

4 

3 

-13 

-7 

10 

11 


2 0 0 
2 2 0 
2 2 2 
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SOLUTIONS TO PRACTICE PROBLEMS 


1. det (A — XI) = A 2 — 3A + 2 = (A — 2)(A — 1). The eigenvalues are 2 and 1, and 


the corresponding eigenvectors are Vi 


and \2 


.Next, form 


P 


D 


and P~ 


Since A = PDP~ 


A & = PZ) 8 尸 _1 = 

"3 r 
2 1 

"2 8 0" 
0 1 8 

"1-1" 
-2 3_ 


"3 r 

"256 0" 

"1-1" 


2 1 

0 1 _ 

_-2 3_ 


766 -765 
510 -509 


2 . 


Compute A\\ = 



=1 -yj, and 


A\ 2 = 


-3 

-2 




3-v 2 


So, Vi and V 2 are eigenvectors for the eigenvalues 1 and 3, respectively. Thus 


A = PDP~ l , where P = 



and 



0 


3. Yes, A is diagonalizable. There is a basis {vi, V 2 } for the eigenspace corresponding 
to A = 3. In addition, there will be at least one eigenvector for A = 5 and one 
for X = —2. Call them V 3 and V 4 . Then {vi, V 2 , V 3 ,¥ 4 } is linearly independent 
by Theorem 2 and Practice Problem 3 in Section 5.1. There can be no additional 
eigenvectors that are linearly independent from Vi ， V 2 , V 3 , V 4 , because the vectors are 

- Mastering: Eigenvalue all in R 4 . Hence the eigenspaces for A = 5 and A = —2 are both one-dimensional. 

SG and Eigenspace 5-14 It follows that A is diagonalizable by Theorem 7(b). 


5.4 EIGENVECTORS AND LINEAR TRANSFORMATIONS 

The goal of this section is to understand the matrix factorization A = PDP~ l as a 
statement about linear transformations. We shall see that the transformation x Ax 
is essentially the same as the very simple mapping u i->- Du, when viewed from the 
proper perspective. A similar interpretation will apply to A and D even when D is not 
a diagonal matrix. 

Recall from Section 1.9 that any linear transformation T from W 1 to R m can be 
implemented via left-multiplication by a matrix A, called the standard matrix of T. 
Now we need the same sort of representation for any linear transformation between two 
finite-dimensional vector spaces. 














































5.4 Eigenvectors and Linear Transformations 289 


The Matrix of a Linear Transformation 

Let V be an n-dimensional vector space, let W be an m-dimensional vector space, and 
let T be any linear transformation from V to W. To associate a matrix with T, choose 
(ordered) bases B and C for V and W, respectively. 

Given any x in V, the coordinate vector [x is in R 71 and the coordinate vector of 
its image, [ T (x) ] c , is in as shown in Fig. 1. 



The connection between [x and [ T (x) ] c is easy to find. Let {bi,..., b„} be the 
basis B for F. If x = r\b\ + • • • + r„b„, then 

[ x h = : 

_r n _ 

and 

T(x) = r(nb! + ••• + r n b n ) = nrcbj) + ••• + r n T(b n ) ⑴ 

because T is linear. Now, since the coordinate mapping from W to R m is linear 
(Theorem 8 in Section 4.4), equation (1) leads to 

[r(x) ] c = n [ r(bO ] c +.. • + r„ [ T(b n )] c (2) 

Since C-coordinate vectors are in R m , the vector equation (2) can be written as a matrix 
equation, namely, 

T [T(x)] c = M[x] b (3) 

x --- >T(x) 

where 

m = [[r(bO] c [ r(b 2 ) ] c ... [r(b„)] c ] ⑷ 


The matrix M is a matrix representation of T, called the matrix for T relative to the 
bases B and C. See Fig. 2. 

Equation (3) says that, so far as coordinate vectors are concerned, the action of T 
on x may be viewed as left-multiplication by M. 


[X], 


Multiplication 


B byM 

FIGURE 2 


►[7Xx)] c 


EXAMPLE 1 Suppose B = {bi,b 2 } is a basis for V and C = {ci, C 2 , C 3 } is a basis 
for W. Let T : K ^ ^ be a linear transformation with the property that 

7(bi) = 3ci — 2 c 2 + 5 c 3 and T(b 2 ) = 4ci + 7 c 2 — C 3 

Find the matrix M for T relative to B and C. 
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SOLUTION The C-coordinate vectors of the images of bi and b2 are 



3" 


4" 

[Hbi)] c = 

-2 

5 

and [ T (b 2 ) ] c = 

7 

-1 



■ 


If B and C are bases for the same space V and if T is the identity transformation 
7"(x) = x for x in V, then matrix M in (4) is just a change-of-coordinates matrix (see 
Section 4.7). 


Linear Transformations from V into V 


T In the common case where W is the same as V and the basis C is the same as B, the 

^ T ( x ) matrix M in (4) is called the matrix for T relative to 谷， or simply the 谷 -matrix for T, 
and is denoted by [ T ] B . See Fig. 3. 

The S-matrix for T : V ^ V satisfies 




Multiplication 
~by m B ~ 


* [T(x)] b 


[T(x)]^ = [T] 8 [x] 3 , for all x in K (5) 


FIGURE 3 


EXAMPLE 2 The mapping T : P 2 ^ F2 defined by 


T(ciq -|- CL\t -J- a2,2) = a〗+ 2^2^ 

is a linear transformation. (Calculus students will recognize T as the differentiation 
operator.) 

a. Find the S-matrix for T, when B is the basis {1, t, t 2 }. 

b. Verify that [ T(p) = [ 7 ] t 3 [v ] 13 for each p in P 2 . 

SOLUTION 

a. Compute the images of the basis vectors: 

7^(1) = 0 The zero polynomial 

T(t) = 1 The polynomial whose value is always 1 

T{t 2 ) = It 

Then write the S-coordinate vectors of 7"(1), T(t), and T(t 2 ) (which are found by 
inspection in this example) and place them together as the S-matrix for T : 




"0" 


_r 


"0" 

0 

， [T(t)] B = 

0 

,[ T{t 2 ) ] B = 

2 

0 

0 

0 




0 

0 

0 


1 0 
0 2 
0 0 
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WEB 


THEOREM 8 


b. For a general p(t) = ao a\t + c^t 2 . 


See Fig. 4. 


a\ 

[ r (P)] B = [A+ 2a 2 f ] B = 2a 2 

0 


"0 

1 

0 一 

a 0 

0 

0 

2 

a\ 

0 

0 

0 

_Cl2 • 


=[ r ]e[P]s 


T 



FIGURE 4 Matrix representation of a linear 
transformation. 


■ 


Linear Transformations on R n 

In an applied problem involving W 1 , a linear transformation T usually appears first as 
a matrix transformation, x Ax. If A is diagonalizable, then there is a basis B for MJ 1 
consisting of eigenvectors of A. Theorem 8 below shows that, in this case, the S-matrix 
for T is diagonal. Diagonalizing A amounts to finding a diagonal matrix representation 
of x i-^ Ax. 


Diagonal Matrix Representation 

Suppose A = PDP~ l , where Z) is a diagonal n 乂 n matrix. If B is the basis for 
R 72 formed from the columns of P , then D is the S-matrix for the transformation 
x \-^Ax. 


PROOF Denote the columns of P by bi,..., b /2 , so that B = {bi,, b„} and P = 
[bi ••• b n ]. In this case, P is the change-of-coordinates matrix Ps discussed in 
Section 4.4, where 

P[x] B = x and [x]^ = P~ l x 
If T (x) = Ax for x in , then 


[^] B = [[r(b!)] B … [r(b„)] B ] 

=[[^4bi ] g Mb” ] B ] 

= [P~ l Ab l ■•- P~ x A\3 n ] 

= P~ l A[b, ••- b„] 

= P~ l AP 


Definition of [ T 
Since T (x) = Ax 
Change of coordinates 
Matrix multiplication 


⑹ 


Since A = PDP~\ we have [T \ B = P - 1 AP = D. 


■ 
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EXAMPLE 3 Define r : M 2 4 R 2 by T(x) = Ax, where A = ^ ^ 

basis B for R 2 with the property that the S-matrix for T is a diagonal matrix. 


.Find a 


SOLUTION From Example 2 in Section 5.3, we know that A = PDP~ l , where 

HD 「5 0"| 

and 5= 卜 3 _ 

The columns of P , call them bi and b 2 , are eigenvectors of A. By Theorem 8, £) is the 
S-matrix for T when B = {bi, b。}. The mappings x\-^ Ax and u i-^ Du describe the 
same linear transformation, relative to different bases. ■ 



Similarity of Matrix Representations 

The proof of Theorem 8 did not use the information that D was diagonal. Hence, 
if A is similar to a matrix C, with A = PCP~ X , then C is the S-matrix for the 
transformation x i-^^4x when the basis B is formed from the columns of P. The 
factorization A = PCP~ l is shown in Fig. 5. 


x — 

Multiplication 
by P- 1 

'' 

Me 


Multiplication 


by A 


Multiplication 


->Ax 


Multiplication 

byP 


by C 


[Ax] w 


FIGURE 5 Similarity of two matrix representations: 
A = PCP~ l . 


Conversely, if T : ^ R 72 is defined by T (x) = Ax, and if B is any basis for 

W 1 , then the B-matrix for T is similar to A. In fact, the calculations in the proof of 
Theorem 8 show that if P is the matrix whose columns come from the vectors in 13, 
then [T\s = P~ l AP. Thus, the set of all matrices similar to a matrix A coincides with 
the set of all matrix representations of the transformation x i-> Ax. 


EXAMPLE 4 LqiA = 



2 ， and b 2 = 



The characteristic 


polynomial of A is (久 + 2) 2 , but the eigenspace for the eigenvalue —2 is only one¬ 
dimensional; so A is not diagonalizable. However, the basis B = {bi,b 2 } has the 
property that the S-matrix for the transformation x ^4x is a triangular matrix called 
the Jordan form of A. 1 Find this S-matrix. 


SOLUTION If /* = [bi \)2 ], then the i3-matrix is P~ X AP. Compute 


AP = 



2 ]_ 「-6 -1 

1 = -4 0 


P~ X AP = 



- 6-1 
-4 0 


-2 1 
0-2 


Notice that the eigenvalue of A is on the diagonal. 


■ 


1 Every square matrix A is similar to a matrix in Jordan form. The basis used to produce a Jordan form 
consists of eigenvectors and so-called “generalized eigenvectors” of A See Chapter 9 of Applied Linear 
Algebra, 3rd ed. (Englewood Cliffs, NJ: Prentice-Hall, 1988), by B. Noble and J. W. Daniel. 
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i— NUMERICAL NOTE - 

An efficient way to compute a S-matrix P~ l AP is to compute AP and then to row 
reduce the augmented matrix [ P AP ] to [ / P~ l AP]. A separate computation 
of P~ l is unnecessary. See Exercise 15 in Section 2.2. 


PRACTICE PROBLEMS 

1. Find T(ao + a\t + a2t 2 ), if T is the linear transformation from F 2 to P 2 whose 
matrix relative ioB = { 1 , t, t 2 } is 

"3 4 0" 

[T] b = 0 5-1 

_ 1 -2 7 _ 

2. Let A, B, and C be n x n matrices. The text has shown that if A is similar to B, 
then B is similar to A. This property, together with the statements below, shows that 
“similar to” is an equivalence relation. (Row equivalence is another example of an 
equivalence relation.) Verify parts (a) and (b). 

a. A is similar to A. 

b. If A is similar to B and B is similar to C, then A is similar to C. 


5.4 EXERCISES 


1. Let B = {bi,b 2 ,b 3 } and T> = {di ， d 2 } be bases for vector 
spaces V and JV, respectively. Let T : K — VK be a linear 
transformation with the property that 

r(b0 = 3dj - 5d 2 , r(b 2 ) = -d! + 6d 2 , r(b 3 ) = 4d 2 

Find the matrix for T relative to B and T>. 

2. LetP = {di ， d 2 } and/3 = {bi, b〗} be bases for vector spaces 
V and W, respectively. Let T : V ^ W be a linear transfor¬ 
mation with the property that 

7((10 = 3b, - 3b 2 , r(d 2 ) = -2bi + 5b 2 


Find the matrix for T relative to T> and B. 

3. Let 8 = {ei,e 2 , 63 } be the standard basis for R 3 , let 
B = {bi,b 2 ,b 3 } be a basis for a vector space V, and let 
T :] R 3 ^ K be a linear transformation with the property that 

T(xi,x 2 ,x 3 ) = (2x 3 - x 2 )bi - (2x 2 )b 2 + (^：i + 3x 3 )b 3 

a. Compute r(ei), r(e 2 ), and r(e 3 ). 

b. Compute [7(ei)] e , [r(e 2 )] B ，and [T(e 3 )] B . 

c. Find the matrix for T relative to S and B. 


4. Let B = {bi, b 2 , b 3 } be a basis for a vector space V and let 
T : V — M 2 be a linear transformation with the property that 


r(xibi + x 2 b 2 + ^b 3 )= 


2xi _ 3x2 + X 3 
— 2,xi ~h 


Find the matrix for T relative to B and the standard basis for 
R 2 . 


5. Let T : P 2 — P 3 be the transformation that maps a polyno¬ 
mial p(t) into the polynomial (t + 3)p(0. 

a. Find the image of p(f) = 3 — 2t 1 2 . 

b. Show that r is a linear transformation. 

c. Find the matrix for T relative to the bases {1, t 2 } and 

{l,t,t 2 ,t 3 }. 

6 . Let T : P 2 — P 4 be the transformation that maps a polyno¬ 
mial p(t) into the polynomial p(?) + 2t 2 p(t). 

a. Find the image of p(0 = 3 — 2t 1 2 . 

b. Show that T is a linear transformation. 

c. Find the matrix for T relative to the bases {1, t 2 } and 

7. Assume the mapping T : P 2 — P 2 defined by 

T{a,Q -\- a\t -\- A, 2 ) = 3a。+ (5a。 _ 2ai)t + (4ai + 0 . 2 )^ 

is linear. Find the matrix representation of T relative to the 
basis B = {l,t, t 2 }. 

8 . Let B = {bi,b 2 ,b〗} be a basis for a vector space V. Find 
T(4b\ — 3 b 2 ) when T is a linear transformation from V to 
V whose matrix relative to B is 

'0 0 1 " 

[T] b = 2 1-2 

1 3 1 
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9. Define T : P 2 —^ R 3 by 7 (p) 


'p(-l)" 

P(0) 

.P(l) 

a. Find the image under T of p(?) = 5 3t. 

b. Show that T is a linear transformation. 

c. Find the matrix for T relative to the basis {l,t, t 2 } for P 2 
and the standard basis for R 3 . 

'p(-2) 

P(3) 

P ⑴ 

■ P(0) 

a. Show that r is a linear transformation. 

b. Find the matrix for T relative to the basis {1, t, t 2 , t 3 } for 
P 3 and the standard basis for R 4 . 

In Exercises 11 and 12, find the 谷 -matrix for the transformation 
x \-^Ax, where B = {b\, b 2 }. 


10. Define r : P 3 ^ R 4 by 7(p) : 


11 . A 


12. A ： 


,bi 


,bi 


,b 2 


,b 2 


In Exercises 13-16, define T : R 2 —^ R 2 by T(x) = Ax. Find a 
basis B for R 2 with the property that [T]^ is diagonal. 


13. A ： 


15. A 


17. Let A 


14. A 


16. A ： 


b 〗 


2 


and B = {bi,b 2 }, for bi = 
.Define T : R 2 —^ E 2 by T (x) = Ax. 


diagonalizable. 
b. Find the i3-matrix for T. 

18. Define T : R 3 — E 3 by T(x) = Ax, where v4 is a 3 x 3 
matrix with eigenvalues 5,5, and —2. Does there exist a basis 
B for R 3 such that the 谷 -matrix for r is a diagonal matrix? 
Discuss. 

Verify the statements in Exercises 19-24. The matrices are square. 

19. If A is invertible and similar to B, then B is invertible 
and A~ l is similar to B~ l . [Hint: P~ l AP = B for some 
invertible P. Explain why B is invertible. Then find an 
invertible Q such that Q~ l A~ l Q = 5 -1 .] 

20. If A is similar to B, then A 2 is similar to B 2 . 

21. If B is similar to A and C is similar to A, then B is similar 
to C. 


22. If A is diagonalizable and B is similar to A, then B is also 
diagonalizable. 

23. If B = P~ l AP and x is an eigenvector of A corresponding 
to an eigenvalue A, then P~ l x is an eigenvector of B corre¬ 
sponding also to A. 

24. If A and B are similar, then they have the same rank. [Hint: 
Refer to Supplementary Exercises 13 and 14 in Chapter 4.] 

25. The trace of a square matrix A is the sum of the diagonal 
entries in A and is denoted by ix A. It can be verified that 
tr(FG) = tr(GF) for any two n x n matrices F and G. 
Show that if A and B are similar, then tr A = tr B. 

26. It can be shown that the trace of a matrix A equals the sum of 
the eigenvalues of A. Verify this statement for the case when 
A is diagonalizable. 

27. Let V be W 1 with a basis B = {bi, … ， b„}; let W be 
with the standard basis, denoted here by 5; and consider the 
identity transformation I : W l R 71 , where /(x) = x. Find 
the matrix for I relative to B and 5. What was this matrix 
called in Section 4.4? 

28. Let K be a vector space with a basis B = {bi,..., b n }, let W 
be the same space V with a basis C = {ci,..., c n }, and let I 
be the identity transformation I \ V ^ W. Find the matrix 
for I relative to B and C. What was this matrix called in 
Section 4.7? 

29. Let K be a vector space with a basis B = {bj,..., b„}. Find 
the /3-matrix for the identity transformation I : V ^ V. 

[M] In Exercises 30 and 31, find the 谷 -matrix for the transforma- 


r 

tion x 1 -^ Ax where B 

= 

{bi 

， b2, b〗}. 


-1 


"6 

-2 - 

-2 






30. A = 

3 

1 - 

-2 

, 





2 

-2 

2 





is not 

1 



2 


-1 


bi = 

1 

, b 2 = 



1 

, b 3 = 

-1 



1 




3 


0 


31. A 


bi 


-7 -48 -16 
1 14 6 

-3 -45 -19 


-3 


-2 


3 

1 

-3 

,b 2 = 

1 

-3 

,b 3 = 

-1 

0 


32. [M] Let T be the transformation whose standard matrix is 
given below. Find a basis for R 4 with the property that [ T ] B 
is diagonal. 

-6 4 0 9' 


A : 


-3 
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SOLUTIONS TO PRACTICE PROBLEMS 

1. Let p(0 = + a \t + a 2 ^ 2 and compute 


[^(P )] 6 = [T] B [p] B = 


"3 4 0" 



3ao + 4a i 

0 5-1 

a\ 

= 

5a\ — a 2 

1 -2 7 

_a 2 _ 


Uq — 2,Cl\ 7^2 


So 7(p) = (3«o + 4ai) + (5ai — a 2 )t + (ao — 2a\ + la^t 1 . 

2. a. A = so A is similar to A. 

b. By hypothesis, there exist invertible matrices P and Q with the property that 
B = P~ l AP and C = Q~ l BQ. Substitute the formula for B into the formula 
for C, and use a fact about the inverse of a product: 


c = Q~ l BQ = Q-\P~ l AP)Q = (PQ)- l A{PQ) 


This equation has the proper form to show that A is similar to C. 


5.5 COMPLEX EIGENVALUES 


Since the characteristic equation of an n x n matrix involves a polynomial of degree n, 
the equation always has exactly n roots, counting multiplicities, provided that possibly 
complex roots are included. This section shows that if the characteristic equation of 
a real matrix A has some complex roots, then these roots provide critical information 
about A. The key is to let A act on the space C n of w-tuples of complex numbers. 1 

Our interest in C n does not arise from a desire to “generalize” the results of the 
earlier chapters, although that would in fact open up significant new applications of 
linear algebra. 2 Rather, this study of complex eigenvalues is essential in order to uncover 
“hidden” information about certain matrices with real entries that arise in a variety of 
real-life problems. Such problems include many real dynamical systems that involve 
periodic motion, vibration, or some type of rotation in space. 

The matrix eigenvalue-eigenvector theory already developed for applies 
equally well to C n . So a complex scalar X satisfies dct(A — XI) = 0 if and only if 
there is a nonzero vector x in C n such that Ax = Ax. We call A a (complex) eigenvalue 
and x a (complex) eigenvector corresponding to A. 


EXAMPLE 1 IfA = 


-1 

0 


then the linear transformation x Ax on M 2 


rotates the plane counterclockwise through a quarter-turn. The action of A is periodic, 
since after four quarter-turns, a vector is back where it started. Obviously, no nonzero 
vector is mapped into a multiple of itself, so A has no eigenvectors in R 2 and hence no 


real eigenvalues. In fact, the characteristic equation of A is 


久 2 


+ 1 = 0 


1 Refer to Appendix B for a brief discussion of complex numbers. Matrix algebra and concepts about 
real vector spaces carry over to the case with complex entries and scalars. In particular, A{cx - \- dy)= 
cAx + dAy, for A anm x n matrix with complex entries, x, y in C n , and c, d in C. 

2 A second course in linear algebra often discusses such topics. They are of particular importance in 
electrical engineering. 
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The only roots are complex: X = i and X = —i. However, if we permit A to act on C 2 , 
then 



Thus i and —i are eigenvalues, with 


and . as corresponding eigenvectors. (A 


method for finding complex eigenvectors is discussed in Example 2.) 


■ 


The main focus of this section will be on the matrix in the next example. 


EXAMPLE 2 LctA = 

for each eigenspace. 



Find the eigenvalues of A, and find a basis 


SOLUTION The characteristic equation of A is 


0 = det 


•5 — A 
•75 


11 _ 入 =(.5 — A)(l.l — A) — (—.6)(.75) 

=A 2 - 1.6A + 1 


From the quadratic formula, X = 士 [1.6 士 yj (—1.6 ) 2 — 4] = .8 士 . 6 /. For the eigen¬ 
value A = .8 — . 6 /, construct 


^-(. 8 -. 6 /)/ = 


.5 — .6 

.75 1.1 _ 

- 

'. 8 - . 6 / 

0 

0 

.8 - . 6 / _ 

— .3 + . 6 / 
.75 

-.6 

.3 + . 6 / 




⑴ 


Row reduction of the usual augmented matrix is quite unpleasant by hand because of the 
complex arithmetic. However, here is a nice observation that really simplifies matters: 
Since .8 — . 6 / is an eigenvalue, the system 

( _ .3 + . 6 / — . 6 x 2 = ^ 

J5x\ + (.3 + . 6/)^2 = 0 


has a nontrivial solution (with x\ and X 2 possibly complex numbers). Therefore, both 
equations in (2) determine the same relationship between X\ and X 2 , and either equation 
can be used to express one variable in terms of the other? 

The second equation in (2) leads to 

.75xi = (—.3 — . 6 i)x 2 
x\ = (—.4 — . 8 /)X 2 

Choose x 2 = 5 to eliminate the decimals, and obtain x\ = —2 — 4/. A basis for the 
eigenspace corresponding to A = .8 — . 6 / is 

"-2 - M " 
y i = ^ 


3 Another way to see this is to realize that the matrix in equation (1) is not invertible, so its rows are linearly 
dependent (as vectors in C 2 ), and hence one row is a (complex) multiple of the other. 
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Analogous calculations for A = .8 + . 6 / produce the eigenvector 


V2 = 


-2 + M 
5 


As a check on the work, compute 


— 


".5 

-. 6 " 

"-2 + M " 


"-4 + 2 / " 

_.75 

1 . 1 _ 

5 


4 + 3/ 


=(.8 + . 6 z ) v 2 


■ 


Surprisingly, the matrix A in Example 2 determines a transformation x\-^ Ax that 
is essentially a rotation. This fact becomes evident when appropriate points are plotted. 


EXAMPLE 3 One way to see how multiplication by the matrix A in Example 2 
affects points is to plot an arbitrary initial point—say, xo = ( 2 , 0 )—and then to plot 
successive images of this point under repeated multiplications by A. That is, plot 



'.5 

-. 6 " 

" 2 ' 


" 1 . 0 " 

Xi = ^4 xq = 

•75 

1.1 

0 

= 

1.5 



".5 

-. 6 " 

" 1 . 0 ' 


'-.4" 

X 2 = Ax\ = 

.75 

1.1 

1.5 

= 

2.4 


x 3 = Ax 2 ,... 


Figure 1 shows Xq,..., xg as larger dots. The smaller dots are the locations of X 9 , …， 
Xioo. The sequence lies along an elliptical orbit. ■ 


x i 


..»• 

w 

/ x 3 X 2 

K 

• 

••• 

...A 

\ x o 



X：-.. 

•• 

: 

t 

•• / 

X 7 .. X 8 

FIGURE 1 Iterates of a point Xq 


under the action of a matrix with a 
complex eigenvalue. 

Of course, Fig. 1 does not explain why the rotation occurs. The secret to the rotation 
is hidden in the real and imaginary parts of a complex eigenvector. 


Real and Imaginary Parts of Vectors 

The complex conjugate of a complex vector x in C” is the vector x in C n whose entries 
are the complex conjugates of the entries in x. The real and imaginary parts of a 
complex vector x are the vectors Rex and Imxin formed from the real and imaginary 
parts of the entries of x. 
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EXAMPLE 4 Ifx = 

" 3-z " 
i 


"3" 

0 

+ / 

"-1 " 
1 

,then 


2 + 5/ 


2 


5 



Rex = 

"3" 

0 

, Imx = 

"-1" 

1 

, and x = 

"3" 

0 

— i 

"-1" 

1 

= 

_ 3 + r 

—i 


2 


5 


2 


5 


2-5/ 


If 5 is an m x n matrix with possibly complex entries, then B denotes the matrix 
whose entries are the complex conjugates of the entries in B. Properties of conjugates 
for complex numbers carry over to complex matrix algebra: 

rx = r x, Bx = Bx, BC = B C , and rB = r B 

Eigenvalues and Eigenvectors of a Real Matrix 
That Acts on C n 

Let A bQ an n x n matrix whose entries are real. Then Ax = Ax = Ax. If X is an 
eigenvalue of A and x is a corresponding eigenvector in C n , then 

Ax = Ax = Xx = Ax 

Hence A is also an eigenvalue of A, with x a corresponding eigenvector. This shows that 
when A is real, its complex eigenvalues occur in conjugate pairs. (Here and elsewhere, 
we use the term complex eigenvalue to refer to an eigenvalue X = a bi , with b ^ 0.) 


EXAMPLE 5 The eigenvalues of the real matrix in Example 2 are complex con¬ 
jugates, namely, .8 — .6/ and .8+ .6/. The corresponding eigenvectors found in 
Example 2 are also conjugates: 

r -2 + M 1 _ 

and \2 = ^ = Vi ■ 

The next example provides the basic “building block” for all real 2x2 matrices 
with complex eigenvalues. 



Im z 



FIGURE 2 


EXAMPLE 6 IfC = 


,where a and b are real and not both zero, then the 


eigenvalues of C are A = a Ibi . (See the Practice Problem at the end of this section.) 
Also, if r = |A| = \/ a 1 b 2 , then 


air 

—b/r 


r 

0" 

cosp 

— sin(/? 

b/r 

a/r 


0 

r 

sinp 

COS (p 


where (p is the angle between the positive x-axis and the ray from (0,0) through {a,b). 
See Fig. 2 and Appendix B. The angle (p is called the argument of X = a bi . Thus 
the transformation x Cx may be viewed as the composition of a rotation through the 
angle (p and a scaling by |A| (see Fig. 3). ■ 


Finally, we are ready to uncover the rotation that is hidden within a real matrix 
having a complex eigenvalue. 
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THEOREM 9 



FIGURE 3 A rotation followed by a 
scaling. 


EXAMPLE 7 


Let A 


•5 

•75 


-.6 

1.1 


,A = .8 — .6i, and vi 


Example 2. Also, let P be the 2x2 real matrix 


P = [Revi 


Imvi ]= 


-2-4 
5 0 



as in 


and let 


C = P~ l AP = 


1 

0 

4" 

■ .5 

-.6" 

"-2 -4" 


".8 -.6" 

20 

-5 

-2 

.75 

1.1 

5 0 


•6 8 


By Example 6, C is a pure rotation because \X\ 2 = (.8) 2 + (.6) 2 = 1. 
C = P~ l AP, we obtain 


From 


A = PCP~ l = P I P~ l 

.0 o 

Here is the rotation “inside” j! The matrix P provides a change of variable, say, 
x = Pu. The action of A amounts to a change of variable from x to u, followed by 
a rotation, and then a return to the original variable. See Fig. 4. The rotation produces 
an ellipse, as in Fig. 1, instead of a circle, because the coordinate system determined 
by the columns of P is not rectangular and does not have equal unit lengths on the two 
axes. ■ 


Change of 
variable 


•Ax 


Change of 
variable 


Rotation 


Cu 


FIGURE 4 Rotation due to a complex eigenvalue. 


The next theorem shows that the calculations in Example 7 can be carried out for 
any 2x2 real matrix A having a complex eigenvalue A. The proof uses the fact that 
if the entries in A are real, then yl(Rex) = Re Ax and ^(Imx) = Im Ax, and if x is an 
eigenvector for a complex eigenvalue, then Rex and Imx are linearly independent in 
R 2 . (See Exercises 25 and 26.) The details are omitted. 


Let ^4 be a real 2x2 matrix with a complex eigenvalue X = a — bi (b ^ 0) and 
an associated eigenvector y in C 2 . Then 

A = PCP~ X , where P = [Rev Imv] and C = 
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FIGURE 5 

Iterates of two points under the 
action of a 3 x 3 matrix with a 
complex eigenvalue. 


The phenomenon displayed in Example 7 persists in higher dimensions. For 
instance, if yl is a 3 x 3 matrix with a complex eigenvalue, then there is a plane in 
M 3 on which A acts as a rotation (possibly combined with scaling). Every vector in that 
plane is rotated into another point on the same plane. We say that the plane is invariant 
under A. 


.8 -.6 0 

EXAMPLE 8 The matrix yl = .6 .8 0 has eigenvalues .8 士 .6i and 

_ 0 0 1.07 _ 

1.07. Any vector Wo in the x 1^2 -plane (with third coordinate 0) is rotated by A into 
another point in the plane. Any vector xo notin the plane has its X 3 -coordinate multiplied 
by 1.07. The iterates of the points wo = (2,0,0) and x。= (2,0,1) under multiplication 
by A are shown in Fig. 5. ■ 


PRACTICE PROBLEM 


Show that if a and b are real, then the eigenvalues of A 


a —b 
b a 


are a 土 bi, with 


corresponding eigenvectors 


and 


5.5 EXERCISES 


Let each matrix in Exercises 1-6 act on C 2 . Find the eigenvalues 
and a basis for each eigenspace in C 2 . 



-2 




3. 





-2 


5. 






In Exercises 7-12, use Example 6 to list the eigenvalues of A. 
In each case, the transformation x i-^- Ax is the composition of a 
rotation and a scaling. Give the angle (p of the rotation, where 
—7i < (p < 7t, and give the scale factor r. 



In Exercises 13-20, find an invertible matrix P and a matrix C 

a — b . . 

of the form T such that the given matrix has the form 

6 a J 

A = PCP~ l . 




-3 


15. 

0 

5" 

2 

16. 

'4 -2" 
1 6_ 

17. 

'-11 

20 

-4" 

5_ 

18. 

'3 -5" 
_2 5_ 

19. 

"1.52 
.56 

-.7" 

.4 

20. 

"-3 -8 
4 5 


21. In Example 2, solve the first equation in (2) for in terms of 

「 2 " 

X\, and from that produce the eigenvector y = 

一 1 + 

for the matrix A. Show that this y is a (complex) multiple of 
the vector Vi used in Example 2. 

22. Let yl be a complex (or real) n x n matrix, and let x in be 
an eigenvector corresponding to an eigenvalue A in C. Show 
that for each nonzero complex scalar 〆， the vector /xx is an 
eigenvector of A. 

Chapter 7 will focus on matrices A with the property that A 7 = A. 
Exercises 23 and 24 show that every eigenvalue of such a matrix 
is necessarily real. 

23. Let Abe ann x n real matrix with the property that A 7 = A, 
let x be any vector in C n , and let q = x T Ax. The equalities 
below show that g is a real number by verifying that q = q. 
Give a reason for each step. 


q = x T Ax = x T Ax = x t Ax = (x T Ax) T = x T A T x = q 
(a) ⑼ (c) (d) (e) 
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24. Let ^4 be an n x /i real matrix with the property that A 7 = A. 
Show that if Ax = Ax for some nonzero vector x in C n , then, 
in fact, X is real and the real part of x is an eigenvector of A. 
[Hint: Compute x^x, and use Exercise 23. Also, examine 
the real and imaginary parts of Ax.] 

25. Let ^4 be a real n y. n matrix, and let x be a vector in C n . 
Show thatRe(i4x) = A (Re x) and Im(v4x) = v4(Imx). 

26. Let be a real 2x2 matrix with a complex eigenvalue 
X = a — bi (b ^ 0) and an associated eigenvector y in C 2 . 

a. Show that ^4 (Re y) = a Re y + Z? Im y and yl(Imv)= 
—b Rev + a Imv. [Hint: Write y = Rey + / Imv, and 
compute ^4v.] 

b. Verify that if P and C are given as in Theorem 9, then 
AP= PC. 


[M] In Exercises 27 and 28, find a factorization of the given 
matrix A in the form A = PCP — 1 ，where C is a block-diagonal 
matrix with 2x2 blocks of the form shown in Example 6. (For 
each conjugate pair of eigenvalues, use the real and imaginary 
parts of one eigenvector in C 4 to create two columns of P.) 




26 

33 

23 

20 

27. 

A = 

—6 

-8 

-1 

—13 

-14 

-19 

-16 

3 



-20 

-20 

-20 

-14 



" 7 

11 

20 

17 

28. 

A = 

-20 

-40 

-86 

-74 

0 

-5 

-10 

-10 



10 

28 

60 

53 


SOLUTION TO PRACTICE PROBLEM 


Remember that it is easy to test whether a vector is an eigenvector. There is no need to 
examine the characteristic equation. Compute 


Ax = 


a 

—b 

r 


a + bi 

■ b 

a 

—i 


b — ai 


=(a + bi) 


Thus 


is an eigenvector corresponding to X = a bi. From the discussion in this 


section, 


must be an eigenvector corresponding to A = a — bi. 


5.6 DISCRETE DYNAMICAL SYSTEMS 

Eigenvalues and eigenvectors provide the key to understanding the long-term behavior, 
or evolution, of a dynamical system described by a difference equation x&+i = Ax^. 
Such an equation was used to model population movement in Section 1.10, various 
Markov chains in Section 4.9, and the spotted owl population in the introductory 
example for this chapter. The vectors give information about the system as time 
(denoted by k) passes. In the spotted owl example, for instance, listed the numbers 
of owls in three age classes at time k. 

The applications in this section focus on ecological problems because they are easier 
to state and explain than, say, problems in physics or engineering. However, dynamical 
systems arise in many scientific fields. For instance, standard undergraduate courses 
in control systems discuss several aspects of dynamical systems. The modem state- 
space design method in such courses relies heavily on matrix algebra. 1 The steady-state 
response of a control system is the engineering equivalent of what we call here the 
“long-term behavior” of the dynamical system x^+i = Ax^. 


1 See G. F. Franklin, J. D. Powell, and A. Emami-Naeimi, Feedback Control of Dynamic Systems, 5 th ed. 
(Upper Saddle River, NJ: Prentice-Hall, 2006). This undergraduate text has a nice introduction to dynamic 
models (Chapter 2). State-space design is covered in Chapters 7 and 8. 
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Until Example 6， we assume that A is diagonalizable, with n linearly indepen¬ 
dent eigenvectors, Vi, … ， v„，and corresponding eigenvalues, 久 i ， … ，久 For conve¬ 
nience, assume the eigenvectors are arranged so that |Ai| > |^ 2 | > > |A„|. Since 

{vi,..., v„} is a basis for R w , any initial vector xo can be written uniquely as 

xo = ci\ x H - + c n y n (1) 

This eigenvector decomposition of xo determines what happens to the sequence {x^}. 
The next calculation generalizes the simple case examined in Example 5 of Section 5.2. 
Since the V/ are eigenvectors, 

Xi = Ax 0 = CiA\x H - h c n A\ n 

=c\X x \i H - + c n X n \ n 


In general, 

x k = ci{Xi) k \i + ■•• + c n (X„) k v„ {k = 0,1,2,...) (2) 

The examples that follow illustrate what can happen in (2) as A: ^ 00 . 


A Predator-Prey System 

Deep in the redwood forests of California, dusky-footed wood rats provide up to 80% of 
the diet for the spotted owl, the main predator of the wood rat. Example 1 uses a linear 
dynamical system to model the physical system of the owls and the rats. (Admittedly, 
the model is unrealistic in several respects, but it can provide a starting point for the 
study of more complicated nonlinear models used by environmental scientists.) 


EXAMPLE 1 Denote the owl and wood rat populations at time k by = , 

L _ 

where k is the time in months, Ok is the number of owls in the region studied, and Rk 
is the number of rats (measured in thousands). Suppose 



Ok+\ = (.5)(9^ + (A)Rk 

^+1 = -^-^ + (1.1)^ 

where p is a positive parameter to be specified. The (.5)(9( in the first equation says 
that with no wood rats for food, only half of the owls will survive each month, while the 
(1.1)^ in the second equation says that with no owls as predators, the rat population 
will grow by 10% per month. If rats are plentiful, the (.4)/^ will tend to make the 
owl population rise, while the negative term —p - Ok measures the deaths of rats due to 
predation by owls. (In fact, 1000/? is the average number of rats eaten by one owl in 
one month.) Determine the evolution of this system when the predation parameter p is 
.104. 

SOLUTION When p = .104, the eigenvalues of the coefficient matrix A for the 
equations in (3) turn out to be X\ = 1.02 and X 2 = .58. Corresponding eigenvectors 
are 



'10' 


'5" 

Vl = 

13 

, v 2 = 

1 


An initial Xo can be written as Xq = CiVi + - Then, for A: > 0, 


x k = ci (1.02 产 n + c 2 (.58) /c V2 


ci(1.02f 


10 

13 


+ c 2 (.58) a 














5.6 Discrete Dynamical Systems 303 


As k ^ oo, (.58)^ rapidly approaches zero. Assume C\ > 0. Then, for all sufficiently 
large is approximately the same as Ci(l.02) k \i, and we write 


Xk w 


Cl (1.02)^ 


10 

13 


The approximation in (4) improves as k increases, and so for large k, 


⑷ 


x^+i ^ Ci(1.02) A:+1 


10 

13 


(1.02)ci(1.02) a 


10 

13 


^ 1.02x 允 


(5) 


The approximation in (5) says that eventually both entries of (the numbers of owls 
and rats) grow by a factor of almost 1.02 each month, a 2% monthly growth rate. By 
(4), Xk is approximately a multiple of (10,13), so the entries in are nearly in the same 
ratio as 10 to 13. That is, for every 10 owls there are about 13 thousand rats. ■ 


Example 1 illustrates two general facts about a dynamical system x 々 +i = Axk in 
which Aisn x n, its eigenvalues satisfy |Ai| > 1 and 1 > \Xj \ for j = 2,..., w, and Vi 
is an eigenvector corresponding to X\. If xo is given by equation (1), with ci ^ 0, then 
for all sufficiently large k, 

x^+i ^ Xix k (6) 

and 

X* « c^AO^vi (7) 

The approximations in (6) and (7) can be made as close as desired by taking k 
sufficiently large. By (6), the eventually grow almost by a factor of X\ each time, so 
X\ determines the eventual growth rate of the system. Also, by (7), the ratio of any two 
entries in (for large k) is nearly the same as the ratio of the corresponding entries in 
Vi，The case in which Ai = 1 is illustrated in Example 5 in Section 5.2. 


Graphical Description of Solutions 

When yl is 2 x 2, algebraic calculations can be supplemented by a geometric description 
of a system’s evolution. We can view the equation Xk-\-\ = Axj^ as a description of 
what happens to an initial point xo in R 2 as it is transformed repeatedly by the mapping 
x Ax. The graph of xo, xi,... is called a trajectory of the dynamical system. 

EXAMPLE 2 Plot several trajectories of the dynamical system x^+i = Ax^, when 

,「.80 0" 

L 0 . 64 _ 

SOLUTION The eigenvalues of A are .8 and .64, with eigenvectors Vi = ^ and 

roi Tf i , 

V 2 = 1 . If x 0 = civi + c 2 \ 2 , then 


Xi = Cl (.8)* 

Of course, tends to 0 because (.8 ) 左 and (.64 户 both approach 0 as A: ^ oo. But the 
way Xk goes toward 0 is interesting. Figure 1 (on page 304) shows the first few terms 
of several trajectories that begin at points on the boundary of the box with corners at 
(士 3, 士 3). The points on each trajectory are connected by a thin curve, to make the 
trajectory easier to see. ■ 


+ c 2 (M) k ® 
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x i 



FIGURE 1 The origin as an attractor. 


In Example 2, the origin is called an attractor of the dynamical system because 
all trajectories tend toward 0. This occurs whenever both eigenvalues are less than 1 
in magnitude. The direction of greatest attraction is along the line through 0 and the 
eigenvector V 2 for the eigenvalue of smaller magnitude. 

In the next example, both eigenvalues of A are larger than 1 in magnitude, and 0 
is called a repeller of the dynamical system. All solutions of x&+i = Ax^ except the 
(constant) zero solution are unbounded and tend away from the origin. 2 


EXAMPLE 3 Plot several typical solutions of the equation x^+i = Axk, where 

A 


1.44 0 

0 1.2 


SOLUTION The eigenvalues of A are 1.44 and 1.2. If xq 


c\ 

Cl 


， then 


x k = d(lA4) k 


+ C 2(1.2)々 


Both terms grow in size, but the first term grows faster. So the direction of greatest re¬ 
pulsion is the line through 0 and the eigenvector for the eigenvalue of larger magnitude. 
Figure 2 shows several trajectories that begin at points quite close to 0. ■ 


In the next example, 0 is called a saddle point because the origin attracts solutions 
from some directions and repels them in other directions. This occurs whenever one 
eigenvalue is greater than 1 in magnitude and the other is less than 1 in magnitude. The 
direction of greatest attraction is determined by an eigenvector for the eigenvalue of 
smaller magnitude. The direction of greatest repulsion is determined by an eigenvector 
for the eigenvalue of greater magnitude. 


2 The origin is the only possible attractor or repeller in a linear dynamical system, but there can be multiple 
attractors and repellers in a more general dynamical system for which the mapping is not linear. 

In such a system, attractors and repellers are defined in terms of the eigenvalues of a special matrix (with 
variable entries) called the Jacobian matrix of the system. 
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EXAMPLE 4 Plot several typical solutions of the equation y^ +1 = Dy k , where 


D = 


2.0 

0 


0 

0.5 


(We write D and y here instead of A and x because this example will be used later.) 
Show that a solution {y^} is unbounded if its initial point is not on the X 2 -axis. 


SOLUTION The eigenvalues of D are 2 and .5. If y 0 


c\ 

ci 


,then 


y k =ci2 k J + 。 (.5 产 ？ ⑻ 

If y 0 is on the X 2 -axis, then c\ = 0 and y k ^ 0 as k ^ oo. But if y 0 is not on the X 2 -axis, 
then the first term in the sum for y k becomes arbitrarily large, and so {y^} is unbounded. 
Figure 3 shows ten trajectories that begin near or on the X 2 -axis. ■ 


x i 



FIGURE 3 The origin as a saddle point. 
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Change of Variable 


The preceding three examples involved diagonal matrices. To handle the nondiagonal 
case, we return for a moment to the n x n case in which eigenvectors of A form a 
basis {vi,...,y w } for W 1 . Let P = [vi ... \ n ], and let D be the diagonal matrix 
with the corresponding eigenvalues on the diagonal. Given a sequence {x^} satisfying 
x^ + i = Ax/c, define a new sequence {y^} by 

yk = or equivalently, x k = Py k 

Substituting these relations into the equation x(+i = Ax^ and using the fact that A = 
PDP~ l , we find that 

Py k+ 1 = APy k = {PDP~ x )Py k = PDy k 
Left-multiplying both sides by P~ l , we obtain 

yk+\ = D yk 

If we write y k as y(k) and denote the entries in y(k) by y\(k),... ,y n (k), then 


y x (k + 1)" 


"Aj 0 

… 0 " 



y 2 (k + 1) 

= 

o a 2 

••• 0 


yi(k) 

y„{k + 1)_ 


0 … 

o 


_y n (k)_ 


The change of variable from Xk to y k has decoupled the system of difference equations. 
The evolution of y\ (k), for example, is unaffected by what happens to y 2 (k)，... ,y n (k), 
because y\{k + 1) = Ai • y\{k) for each k. 

The equation = Py k says that y k is the coordinate vector of Xk with respect to 
the eigenvector basis {vi ， . •. ， y„}. We can decouple the system x^+i = Ax^ by making 
calculations in the new eigenvector coordinate system. When n = 2, this amounts to 
using graph paper with axes in the directions of the two eigenvectors. 

EXAMPLE 5 Show that the origin is a saddle point for solutions of = Ax^, 
where 

_ r 1.25 -.75" 

A = L ~- 75 125 . 

Find the directions of greatest attraction and greatest repulsion. 


SOLUTION Using standard techniques, we find that A has eigenvalues 2 and .5, with 


corresponding eigenvectors Vi 


-1 


and \2 


， respectively. Since |2| > 1 


and |.5| < 1， the origin is a saddle point of the dynamical system. If Xo = CiVi + 
then 

x k = Ci2 k \i +C 2 {.5) k y 2 (9) 


This equation looks just like equation (8) in Example 4, with vi and \2 in place of the 
standard basis. 

On graph paper, draw axes through 0 and the eigenvectors Vi and v〗.See Fig. 4. 
Movement along these axes corresponds to movement along the standard axes in Fig. 3. 
In Fig. 4, the direction of greatest repulsion is the line through 0 and the eigenvector vi 
whose eigenvalue is greater than 1 in magnitude. If xo is on this line, the ci in (9) is zero 
and Xk moves quickly away from 0. The direction of greatest attraction is determined 
by the eigenvector \2 whose eigenvalue is less than 1 in magnitude. 

A number of trajectories are shown in Fig. 4. When this graph is viewed in terms of 
the eigenvector axes, the picture “looks” essentially the same as the picture in Fig. 3. ■ 
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: y 



FIGURE 4 The origin as a saddle point. 


Complex Eigenvalues 

When a real 2x2 matrix A has complex eigenvalues, A is not diagonalizable (when 
acting on R 2 ), but the dynamical system x 众 +i = Ax^ is easy to describe. Example 3 
of Section 5.5 illustrated the case in which the eigenvalues have absolute value 1. The 
iterates of a point Xq spiraled around the origin along an elliptical trajectory. 

If A has two complex eigenvalues whose absolute value is greater than 1 ， then 0 is 
a repeller and iterates of xo will spiral outward around the origin. If the absolute values 
of the complex eigenvalues are less than 1 ， then the origin is an attractor and the iterates 
of xq spiral inward toward the origin, as in the following example. 


EXAMPLE 6 It can be verified that the matrix 


A 


•8 .5 

-.1 1.0 


has eigenvalues .9 士 .2/，with eigenvectors 


1 干 2/ 


.Figure 5 (on page 308) shows 


three trajectories of the system x&+i = Ax^, with initial vectors 

"(T 

-2.5 


O' 


"3" 

_2.5_ 


0 _ 


,and 


■ 


Survival of the Spotted Owls 

Recall from this chapter’s introductory example that the spotted owl population in the 
Willow Creek area of California was modeled by a dynamical system = Ax/c in 
which the entries in = (jk,Sk,cik) listed the numbers of females (at time k) in the 
juvenile, subadult, and adult life stages, respectively, and A is the stage-matrix 

0 0 .33" 

A = .18 0 0 (10) 

0 .71 .94 
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x i 



FIGURE 5 Rotation associated with complex 
eigenvalues. 


MATLAB shows that the eigenvalues of A are approximately X\ = .98, 
久 2 = —.02 + .21/, and A 3 = —.02 — .21/. Observe that all three eigenvalues are 
less than 1 in magnitude, because | 久 2I 2 = I 久 3 | 2 = (― . 02) 2 + (. 21) 2 = .0445. 

For the moment, let A act on the complex vector space C 3 . Then, because A has 
three distinct eigenvalues, the three corresponding eigenvectors are linearly independent 
and form a basis for C 3 . Denote the eigenvectors by Vi, V 2 , and V 3 . Then the general 
solution of x^_|_i = Ax/c (using vectors in C 3 ) has the form 

Xk = Ci(Ai)*Vi + < ： 2 ( 久 2)〜 2 + C 3 (A 3 ) / ： V3 (11) 

If xo is a real initial vector, then xi = ^4xo is real because A is real. Similarly, the 
equation x^+i = Ax^ shows that each x 众 on the left side of ( 11 ) is real, even though 
it is expressed as a sum of complex vectors. However, each term on the right side 
of ( 11 ) is approaching the zero vector, because the eigenvalues are all less than 1 in 
magnitude. Therefore the real sequence approaches the zero vector, too. Sadly, this 
model predicts that the spotted owls will eventually all perish. 

Is there hope for the spotted owl? Recall from the introductory example that the 
18% entry in the matrix A in (10) comes from the fact that although 60% of the juvenile 
owls live long enough to leave the nest and search for new home territories, only 30% 
of that group survive the search and find new home ranges. Search survival is strongly 
influenced by the number of clear-cut areas in the forest, which make the search more 
difficult and dangerous. 

Some owl populations live in areas with few or no clear-cut areas. It may be that 
a larger percentage of the juvenile owls there survive and find new home ranges. Of 
course, the problem of the spotted owl is more complex than we have described, but the 
final example provides a happy ending to the story. 

EXAMPLE 7 Suppose the search survival rate of the juvenile owls is 50%, so the 
(2,1)-entry in the stage-matrix A in (10) is .3 instead of .18. What does the stage-matrix 
model predict about this spotted owl population? 

SOLUTION Now the eigenvalues of A turn out to be approximately X\ = 1.01, X 2 = 
—.03 + .26/, and A 3 = —.03 — .26/. An eigenvector for A 1 is approximately Vi = 
(10, 3, 31). Let \2 and V 3 be (complex) eigenvectors for A 2 and A 3 . In this case, equation 
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(11) becomes 

x k = ci(1.01)S + c 2 (-.03 + .26i) k \ 2 + C 3 (—.03 — .26i) k \ 3 

As k ^ oo, the second two vectors tend to zero. So becomes more and more like 
the (real) vector ci(1.01)^vi. The approximations in equations (6) and (7), following 
Example 1, apply here. Also, it can be shown that the constant c\ in the initial 
decomposition of xo is positive when the entries in xo are nonnegative. Thus the owl 
population will grow slowly, with a long-term growth rate of 1.01. The eigenvector Vi 
describes the eventual distribution of the owls by life stages: for every 31 adults, there 
will be about 10 juveniles and 3 subadults. ■ 

Further Reading 

Franklin, G. F., J. D. Powell, and M. L. Workman. Digital Control of Dynamic Systems, 
3rd ed. Reading, MA: Addison-Wesley, 1998. 

Sandefur, James T. Discrete Dynamical Systems — Theory and Applications. Oxford: 
Oxford University Press, 1990. 

Tuchinsky, Philip. Management of a Buffalo Herd, UMAP Module 207. Lexington, 
MA: COMAP, 1980. 


PRACTICE PROBLEMS 


1. The matrix A below has eigenvalues 1, and with corresponding eigenvectors 
vi,v 2 , and v 3 : 


1 

7 -2 0" 


"-2" 


"2" 


r 

A =- 

n 

-2 6 2 

, Vi = 

2 

, V2 = 

1 

,v 3 = 

2 

y 

0 2 5 


1 


2 


-2 


Find the general solution of the equation x^+i = Axk if xo = 11 . 

_- 2 _ 

2. What happens to the sequence {x^} in Practice Problem l sls k ^ oo? 


5.6 EXERCISES 


1. Let yl be a 2 x 2 matrix with eigenvalues 3 and 1/3 and 

corresponding eigenvectors Vi = j andv 2 = 

{x^ be a solution of the difference equation Xk+i = Axk, 
f 9" 

x o = i . 

a. Compute Xi = Axq. [Hint: You do not need to know A 
itself.] 

b. Find a formula for involving k and the eigenvectors Vi 
and \ 2 . 



2. Suppose the eigenvalues of a 3 x 3 matrix A are 3, 4/5, and 

,and 


r 


2" 

0 


1 

-3 


-5 


3/5, with corresponding eigenvectors 

.Find the solution of the equation 


x^_|_i = Axk for the specified Xq, and describe what happens 
as A: — oo. 


"-3" 


"-2" 

-3 

7 

.Letxo = 

-5 

3 
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In Exercises 3-6, assume that any initial vector xo has an eigen¬ 
vector decomposition such that the coefficient C\ in equation (1) 
of this section is positive. 3 

3. Determine the evolution of the dynamical system in Exam¬ 
ple 1 when the predation parameter /? is .2 in equation (3). 
(Give a formula for x ； t.) Does the owl population grow or 
decline? What about the wood rat population? 


4. Determine the evolution of the dynamical system in Example 
1 when the predation parameter p is .125. (Give a formula 
for x 灸 .）As time passes, what happens to the sizes of the owl 
and wood rat populations? The system tends toward what is 
sometimes called an unstable equilibrium. What do you think 
might happen to the system if some aspect of the model (such 
as birth rates or the predation rate) were to change slightly? 


5. 


In old-growth forests of Douglas fir, the spotted owl dines 
mainly on flying squirrels. Suppose the predator-prey matrix 


for these two populations is A 


A 

—P 


.3 

1.2 


.Show that if 


the predation parameter p is .325, both populations grow. 
Estimate the long-term growth rate and the eventual ratio of 
owls to flying squirrels. 


6. Show that if the predation parameter p in Exercise 5 is .5, 
both the owls and the squirrels will eventually perish. Find a 
value of p for which populations of both owls and squirrels 
tend toward constant levels. What are the relative population 
sizes in this case? 


7. Let A have the properties described in Exercise 1. 

a. Is the origin an attractor, a repeller, or a saddle point of 
the dynamical system 

b. Find the directions of greatest attraction and/or repulsion 
for this dynamical system. 

c. Make a graphical description of the system, showing the 
directions of greatest attraction or repulsion. Include 
a rough sketch of several typical trajectories (without 
computing specific points). 

8. Determine the nature of the origin (attractor, repeller, or 
saddle point) for the dynamical system x^+i = Ax^ if A has 
the properties described in Exercise 2. Find the directions of 
greatest attraction or repulsion. 


In Exercises 9-14, classify the origin as an attractor, repeller, 
or saddle point of the dynamical system x 々 +i = Ax/c. Find the 
directions of greatest attraction and/or repulsion. 


9. A = 


1.7 

- 1.2 


— .3 
.8 


10. A = 


.4 

1.1 


3 One of the limitations of the model in Example 1 is that there always 
exist initial population vectors xo with positive entries such that the 
coefficient c\ is negative. The approximation (7) is still valid, but the 
entries in eventually become negative. 


11. A = 

13. A = 


A .5 

-.4 1.3 

.8 .3 

-.4 1.5 


12. A = 

14. A = 


.5 .6 

-.3 1.4 

1.7 .6" 

-.4 .7 


15. 



~ .4 

0 

.2" 


".r 

Let A = 

.3 

.8 

.3 

.The vector Vi = 

.6 


.3 

.2 

.5 


.3 


is an 


eigenvector for A, and two eigenvalues are .5 and .2. Con¬ 
struct the solution of the dynamical system x 灸 +i = Ax^ that 
satisfies xo = (0, .3, .7). What happens to x ^： as A: —^ oo? 


16. [M] Produce the general solution of the dynamical system 
X/c+i = Ax/c when A is the stochastic matrix for the Hertz 
Rent A Car model in Exercise 16 of Section 4.9. 


17. Construct a stage-matrix model for an animal species that has 
two life stages: juvenile (up to 1 year old) and adult. Suppose 
the female adults give birth each year to an average of 1.6 
female juveniles. Each year, 30% of the juveniles survive 
to become adults and 80% of the adults survive. For k >0, 
let Xk = (jk ， cik )，where the entries in Xk are the numbers of 
female juveniles and female adults in year k. 

a. Construct the stage-matrix A such that x/t+i = Ax^ for 
k >0. 

b. Show that the population is growing, compute the even¬ 
tual growth rate of the population, and give the eventual 
ratio of juveniles to adults. 

c. [M] Suppose that initially there are 15 juveniles and 10 
adults in the population. Produce four graphs that show 
how the population changes over eight years: (a) the 
number of juveniles, (b) the number of adults, (c) the 
total population, and (d) the ratio of juveniles to adults 
(each year). When does the ratio in (d) seem to stabilize? 
Include a listing of the program or keystrokes used to 
produce the graphs for (c) and (d). 


18. A herd of American buffalo (bison) can be modeled by a stage 
matrix similar to that for the spotted owls. The females can be 
divided into calves (up to 1 year old), yearlings (1 to 2 years), 
and adults. Suppose an average of 42 female calves are 
bom each year per 100 adult females. (Only adults produce 
offspring.) Each year, about 60% of the calves survive, 75% 
of the yearlings survive, and 95% of the adults survive. For 
k >0, let Xk = (ck, y/c’ciic )，where the entries in Xk are the 
numbers of females in each life stage at year k. 

a. Construct the stage-matrix A for the buffalo herd, such 
that x^-|-i = Ax/c for k > 0. 

b. [M] Show that the buffalo herd is growing, determine 
the expected growth rate after many years, and give the 
expected numbers of calves and yearlings present per 100 
adults. 
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SOLUTIONS TO PRACTICE PROBLEMS 


1. The first step is to write Xq as a linear combination of Vi, V 2 , and V 3 . Row reduction 
of [ Vi \2 V 3 xq ] produces the weights c\ = 2, C 2 = 1, and C 3 = 3, so that 


x 0 = 2vi + lv 2 + 3 v3 


Since the eigenvalues are 1, and 圣 ， the general solution is 


x k =2 - l k \i + 1 - (■) v 2 + 3 -( 圣 ) 

1 一 

2 

-2 

2. As k ^ 00 , the second and third terms in (12) tend to the zero vector, and 



( 12 ) 


x k = 2 vi + 



k 

V2 + 3 


G ) v3 ^ 2vi 


-4 

4 

2 


7 APPLICATIONS TO DIFFERENTIAL EQUATIONS 

This section describes continuous analogues of the difference equations studied in 
Section 5.6. In many applied problems, several quantities are varying continuously 
in time, and they are related by a system of differential equations: 

x[ = a u xi H - h a\ n x n 

x r 2 = a 2 \Xi H - h a 2n x n 

■^n = + • • • + a nn X n 

Here x\,... ,x n are differentiable functions of t, with derivatives x[,... ,x f n , and the aq 
are constants. The crucial feature of this system is that it is linear. To see this, write the 
system as a matrix differential equation 

x’(r) = ^x(0 (1) 

where 







an • 

• Cl\n 

x (0 = 


， x’ 0 )= 


, and A = 




_x n (t) _ 


. x， n(0_ 


_a n i • 

• a nn 一 


A solution of equation (1) is a vector-valued function that satisfies (1) for all t in some 
interval of real numbers, such as ^ > 0 . 

Equation (1) is linear because both differentiation of functions and multiplication of 
vectors by a matrix are linear transformations. Thus, if u and v are solutions of x r = Ax, 
then cu + d\ is also a solution, because 

(cu + d\) r = cu’ + dV 

=cAu + dA\ = A(cu + d\) 
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(Engineers call this property superposition of solutions.) Also, the identically zero 
function is a (trivial) solution of (1). In the terminology of Chapter 4, the set of all 
solutions of (1) is a subspace of the set of all continuous functions with values in R w . 

Standard texts on differential equations show that there always exists what is 
called a fundamental set of solutions to (1). If ^4 is « x n, then there are n linearly 
independent functions in a fundamental set, and each solution of (1) is a unique linear 
combination of these n functions. That is, a fundamental set of solutions is a basis for 
the set of all solutions of (1), and the solution set is an n-dimensional vector space of 
functions. If a vector xo is specified, then the initial value problem is to construct the 
(unique) function x such that x’ = Ax and x(0) = x。. 

When 4 is a diagonal matrix, the solutions of (1) can be produced by elementary 
calculus. For instance, consider 




"3 0 " 

又 1 (，） 

_x' 2 (t) _ 


0-5 

_x 2 (t) _ 


⑵ 


that is, 

x[(t) = 3xi(0 

•4(0 = -5x 2 (0 


(3) 


The system (2) is said to be decoupled because each derivative of a function depends 
only on the function itself, not on some combination or “coupling” of both X\{t) and 
X 2 (，). From calculus, the solutions of (3) are x\(t) = C\e 3t and X 2 (t) = C 2 e 一 5t ， for any 
constants C\ and C 2 . Each solution of equation (2) can be written in the form 


X\(t) 


■ 3 / ■ 

c\e Jl 


1 


0 

xiit) 


—5 / 

C 2 e 

=Cl 

0 

e + c 2 

1 


This example suggests that for the general equation x 7 = Ax, a solution might be a 
linear combination of functions of the form 

x(0 = ye Xt (4) 


for some scalar A and some fixed nonzero vector v. [If v = 0, the function x(f) is 
identically zero and hence satisfies x ; = Ax.] Observe that 

x r (?) = X\e^ By calculus, since v is a constant vector 
^4x(^) = A\e^ Multiplying both sides of (4) by A 



FIGURE 1 


Since e Xt is never zero, x’ （ t) will equal Ax(t) if and only if Ay = A\, that is, if and 
only if A is an eigenvalue of A and y is a corresponding eigenvector. Thus each 
eigenvalue-eigenvector pair provides a solution (4) of x r = Ax. Such solutions are 
sometimes called eigenfunctions of the differential equation. Eigenfunctions provide 
the key to solving systems of differential equations. 


EXAMPLE 1 The circuit in Fig. 1 can be described by the differential equation 


x[(t) 


—(1/ R\ + 1/ R2)/C\ 

1/( 馬 Q)- 

X\(t) 

x' 2 (t) 


1/(^ 2 C 2 ) 

-IKR 2 C 2 ) 



where Xi(?) and X 2 {t) are the voltages across the two capacitors at time t. Suppose 
resistor R\ is 1 ohm, R 2 is 2 ohms, capacitor Ci is 1 farad, and C 2 is .5 farad, and 
suppose there is an initial charge of 5 volts on capacitor C\ and 4 volts on capacitor C 2 . 
Find formulas for and X 2 (t) that describe how the voltages change over time. 
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SOLUTION Let A denote the matrix displayed above, and let x(^)= 
-1.5 


又 i0) 
^ 2(0 


data given, A 


•5 

-1 


,and x(0)= 


4 


.For the 

.The eigenvalues of A are X\ = —.5 


and 久 2 = —2, with corresponding eigenvectors 


Vi = 2 and \2 = 

The eigenfunctions xi (f) = \\e^ lt and X 2 (，) = \ 2 ^ 2t both satisfy x’ = Ax, and so does 
any linear combination of Xi and x〗.Set 



x(0 


ci\ x e Xlt + c 2 \ 2 e X2t 


c\ 


+ c 2 


-It 


and note that x(0) = c\\\ + C 2 V 2 . Since Vi and \2 are obviously linearly independent 
and hence span R 2 , c\ and C 2 can be found to make x(0) equal to xq. In fact, the equation 



1 


-1 


5 

C\ 

2 

+ C2 

1 


4 


Vl v 2 x 0 

leads easily toci = 3 and C 2 = —2. Thus the desired solution of the differential equation 
x’ = Ax is 

x(>) = 3 





3e~ 5t + 2e~ 2t 

_ 讀 )_ 


6e~ 5t - le~ 2t 


Figure 2 shows the graph, or trajectory, of x(t )，for t > 0, along with trajectories for 
some other initial points. The trajectories of the two eigenfunctions xi and X 2 lie in the 
eigenspaces of A. 

The functions xi and X 2 both decay to zero as t 00 , but the values of X 2 
decay faster because its exponent is more negative. The entries in the corresponding 
eigenvector \2 show that the voltages across the capacitors will decay to zero as rapidly 
as possible if the initial voltages are equal in magnitude but opposite in sign. ■ 



FIGURE 2 The origin as an attractor. 
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In Fig. 2, the origin is called an attractor, or sink, of the dynamical system because 
all trajectories are drawn into the origin. The direction of greatest attraction is along the 
trajectory of the eigenfunction X 2 (along the line through 0 and V 2 ) corresponding to 
the more negative eigenvalue, A = —2. Trajectories that begin at points not on this line 
become asymptotic to the line through 0 and Vi because their components in the \2 
direction decay so rapidly. 

If the eigenvalues in Example 1 were positive instead of negative, the corresponding 
trajectories would be similar in shape, but the trajectories would be traversed away from 
the origin. In such a case, the origin is called a repeller, or source, of the dynamical 
system, and the direction of greatest repulsion is the line containing the trajectory of the 
eigenfunction corresponding to the more positive eigenvalue. 


EXAMPLE 2 Suppose a particle is moving in a planar force field and its position 
vector x satisfies x’ = Ax and x(0) = xq, where 


A = 


4 

-2 





Solve this initial value problem for ^ > 0, and sketch the trajectory of the particle. 


SOLUTION The eigenvalues of A turn out to be 又 i = 6 and 又 2 = — 1 ， with correspond¬ 
ing eigenvectors Vi = (—5,2) and V 2 = (1,1). For any constants C\ and C2, the function 


x(t) = c\\\e Xlt + c 2 \ie X2t = c\ 



+ C2 



is a solution of x f = Ax. We want C\ and C2 to satisfy x(0) = Xq, that is, 



'-5' 


'1' 


'2.9' 


"-5 r 

C\ 


'2.9" 

C\ 

2 

+ c 2 

1 

_ 

2.6 

or 

2 1 

Cl 

_ 

2.6 


Calculations show that C\ = —3/70 and C2 = 188/70, and so the desired function is 

-3 


X(0 = 70 

Trajectories of x and other solutions are shown in Fig. 3. 


'-5' 


_r 

2 

70 

1 


■ 


In Fig. 3, the origin is called a saddle point of the dynamical system because 
some trajectories approach the origin at first and then change direction and move away 
from the origin. A saddle point arises whenever the matrix A has both positive and 
negative eigenvalues. The direction of greatest repulsion is the line through Vi and 0, 
corresponding to the positive eigenvalue. The direction of greatest attraction is the line 
through \2 and 0, corresponding to the negative eigenvalue. 



FIGURE 3 The origin as a saddle point. 
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Decoupling a Dynamical System 

The following discussion shows that the method of Examples 1 and 2 produces a 
fundamental set of solutions for any dynamical system described by x f = Ax when A 
is n x n and has n linearly independent eigenvectors, that is, when A is diagonalizable. 
Suppose the eigenfunctions for A are 

\\e Xl \ \ n e Xnt 


with Vi,..., v„ linearly independent eigenvectors. Let 尸 =[Vi • • • ], and let D 

be the diagonal matrix with entries , X n , so that ^4 = PDP~ l . Now make a change 
of variable ，defining a new function y by 

y(0 = or, equivalently, x(t) = Py(t) 


The equation x(t) = _Py(0 says that y(t) is the coordinate vector of x(Y) relative to the 
eigenvector basis. Substitution of Py for x in the equation x f = Ax gives 

^-(Py) = A(Py) = (PDP~ l )Py = PDy (5) 

at 

Since P is a constant matrix, the left side of (5) is Py r . Left-multiply both sides of (5) 
by P— 1 and obtain y’ = Dy, or 




"Ai 0 

… 0 " 


"ji(0" 


= 

o a 2 

... 0 


yi{t) 



0 … 

0 


_y n {t) _ 


The change of variable from x to y has decoupled the system of differential equations, 
because the derivative of each scalar function yk depends only on (Review the anal¬ 
ogous change of variables in Section 5.6.) Since y[ = X\y\, we have ji(0 = C\e Xlt , 
with similar formulas for 少 2 , Thus 


y(0 = 


， where 

~ Ci~ 


_c n e Xnt _ 


_^n _ 


=y(0 )= 尸一 4(0) = p-'xo 


To obtain the general solution x of the original system, compute 
x{t) = Py(t) = [Vi … v„ ] y(t) 

=ci\ x e Xlt + ■ ■ ■ + c n y n e Xnt 


This is the eigenfunction expansion constructed as in Example 1. 


Complex Eigenvalues 

In the next example, a real matrix A has a pair of complex eigenvalues A and A, with 
associated complex eigenvectors v and v. (Recall from Section 5.5 that for a real matrix, 
complex eigenvalues and associated eigenvectors come in conjugate pairs.) So two 
solutions of x f = Ax are 

Xl (0 = \e Xt and x 2 (0 = ve Al (6) 

It can be shown that X 2 (t) = Xi(?) by using a power series representation for the 
complex exponential function. Although the complex eigenfunctions xi and X 2 are 
convenient for some calculations (particularly in electrical engineering), real functions 
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FIGURE 4 


are more appropriate for many purposes. Fortunately, the real and imaginary parts of xi 
are (real) solutions of x f = Ax, because they are linear combinations of the solutions in 
(6)： 

Re(ve A! ) = ^[xi(0 +xi(?)], Im(ve A 0 = -^[xi(0 -xi(?)] 

2 2i 

To understand the nature of Re(y^ A? )» recall from calculus that for any number x, 
the exponential function e x can be computed from the power series: 

v 1 9 1 
c = 1 + x + —x + • • • H — -x n + • • • 

2! n\ 

This series can be used to define when A is complex: 

= 1 + (久 f) + —(A^) 2 + ••• H — -{Xt) n + … 

2 ! n\ 

By writing X = a bi (with a and b real), and using similar power series for the cosine 
and sine functions, one can show that 

e (a+bi)t = e at . e ibt = e at^ bt+i sin ^ r ) ⑺ 

Hence 

\e Xt = (Re v + i Imv) • e at (cos bt + i sin bt) 

=[(Re y) cos bt — (Im y) sin bt ]e at 
+ i [ (Re v) sin bt + (Imv) co^bt ]e at 
So two real solutions of x ; = Ax are 

y! (?) = Re xi (/) = [ (Re y) cos bt — (Im y) sin bt ] e at 
y 2 (0 = Imxi(f) = [ (Re y) sin bt + (Im y) cos bt ] e at 


It can be shown that y x and y 2 are linearly independent functions (when b ^ 0). 1 


EXAMPLE 3 The circuit in Fig. 4 can be described by the equation 


i； L 


_ V ， C_ 



-Ri/L -1/L 
1/C -l/(RiC) 


h 

vc 


where z.l is the current passing through the inductor L and vc is the voltage drop across 
the capacitor C. Suppose R\ is 5 ohms, R 2 is .8 ohm, C is .1 farad, and L is .4 henry. 
Find formulas for ii and Vc, if the initial current through the inductor is 3 amperes and 
the initial voltage across the capacitor is 3 volts. 


SOLUTION For the data given, A 


-2 -2.5 
10 -2 


and xq 


The method 


discussed in Section 5.5 produces the eigenvalue A = —2 + 5/ and the corresponding 


eigenvector Vi 
nations of 


xi(0 


.The complex solutions of x' = Ax are complex linear combi- 

e (-2+5i)t and X2 y) 


—l 

2 


,(- 2 _ 50 ? 


1 Since X2(0 is the complex conjugate of xi (?), the real and imaginary parts of X2(0 are 5^ (t) and —y 2 ( 0 ^ 
respectively. Thus one can use either xi (?) or X2(t), but not both, to produce two real linearly independent 
solutions of x f = Ax. 
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FIGURE 5 

The origin as a spiral point. 


Next, use equation (7) to write 


xi(0 = 


e~ 2t (cos 5t + i sin 5t) 


The real and imaginary parts of xi provide real solutions: 

yi(() 


— sin 5t 


cos 5t 

2 cos 5t 

e ' y 2 (0 = 

2 sin 5t 


Since y x and y 2 are linearly independent functions, they form a basis for the two- 
dimensional real vector space of solutions of x r = Ax. Thus the general solution is 


To satisfy x(0)= 
C 2 = 3. Thus 



— sin 5t 

—7t , 

cos 5t 

=c x 

2 cos 5t 

e + c 2 

2 sin 5t 



L 




1 - 

3 

_3_ 

,we need ci 

0 

2 

+ Cl 

1 

0 

= 

3 

3 


-it 


， which leads to c\ 


x(/) = 1.5 


— sin 5t 



cos 5t 

2 cos 5t 

e~ 

2? + 3 

2 sin 5 ， 


-it 


= 1.5 and 


' lL{t)' 


—1.5 sin + 3 cos 5t 

， C(0_ 


3 cos 5t 6 sin 5t 


See Fig. 5. ■ 

In Fig. 5, the origin is called a spiral point of the dynamical system. The rotation 
is caused by the sine and cosine functions that arise from a complex eigenvalue. The 
trajectories spiral inward because the factor e~ 2t tends to zero. Recall that —2 is the real 
part of the eigenvalue in Example 3. When A has a complex eigenvalue with positive 
real part, the trajectories spiral outward. If the real part of the eigenvalue is zero, the 
trajectories form ellipses around the origin. 


PRACTICE PROBLEMS 

A real 3x3 matrix A has eigenvalues —.5, .2 + .3/, and .2 — .3/, with corresponding 
eigenvectors 



r 


"1 + 2/ " 


"1 - 2/ " 

Vl = 

-2 

1 

,v 2 = 

M 

2 

, and y 3 = 

-M 

2 


1. Is A diagonalizable as A = PDP — 1 2 3 ，using complex matrices? 

2. Write the general solution of x f = Ax using complex eigenfunctions, and then find 
the general real solution. 

3. Describe the shapes of typical trajectories. 


5.7 EXERCISES 

1. A particle moving in a planar force field has a position vector 
x that satisfies x r = Ax. The 2x2 matrix A has eigenvalues 

. . . 「一31 

4 and 2, with corresponding eigenvectors Vi = and 


\2 = i . Find the position of the particle at time t, 

. — 6 
assuming that x(0) = 1 . 





















































318 CHAPTER 5 Eigenvalues and Eigenvectors 


2. Let yl be a 2 x 2 matrix with eigenvalues —3 and —1 and 

. r -i 1 

corresponding eigenvectors Vi = ^ andv 2 = 

x(t) be the position of a particle at time t. Solve the initial 

[ 2 " 

value problem x’ = Ax, x(0 )=.. 



In Exercises 3-6, solve the initial value problem x f (t) = ^4x(^) 
for t > 0, with x(0) = (3,2). Classify the nature of the origin 
as an attractor, repeller, or saddle point of the dynamical system 
described by x r = Ax. Find the directions of greatest attraction 
and/or repulsion. When the origin is a saddle point, sketch typical 
trajectories. 

3. A = 

5. A = 


4. A 

6. A 


—6 -11 16 

16. [M] A = 2 5-4 

一 4 -5 10 _ 

30 64 23" 

17. [M] A = -11 -23 -9 

6 15 4_ 

"53 -30 -2" 

18. [M] A = 90 -52 -3 

20 —10 2 _ 

19. [M] Find formulas for the voltages i；i and V 2 (as functions of 
time t) for the circuit in Example 1, assuming that R\ = 1/5 
ohm, R 2 = 1/3 ohm, Ci = 4 farads, C 2 = 3 farads, and the 
initial charge on each capacitor is 4 volts. 


20. [M] Find formulas for the voltages V\ and V 2 for the circuit in 
Example 1, assuming that R\ = 1/15 ohm, R 2 = 1/3 ohm, 
Ci = 9 farads, C 2 = 2 farads, and the initial charge on each 
capacitor is 3 volts. 


In Exercises 7 and 8, make a change of variable that decouples 
the equation x’ = Ax. Write the equation x(t) = Py(t) and 
show the calculation that leads to the uncoupled system y r = Dy, 
specifying P and D. 


21. [M] Find formulas for the current ii and the voltage vc 
for the circuit in Example 3, assuming that R\ = l ohm, 
R 2 = .125 ohm, C = .2 farad, L = .125 henry, the initial 
current is 0 amp, and the initial voltage is 15 volts. 


7. A as in Exercise 5 8. A as in Exercise 6 


22. [M] The circuit in the figure is described by the equation 


In Exercises 9-18, construct the general solution of x f = Ax 
involving complex eigenfunctions and then obtain the general real 
solution. Describe the shapes of typical trajectories. 


9. A = 

11. A = 


-3 2 

-1 -1 

-3 -9 
2 3 


10. A = 

12. A = 


-2 1 

-7 10 

-4 5 


13. A = 



14. A 


15. [M] A = 


-8 -12 -6 
2 1 2 
7 12 5 



0 

-i/c 


i/L 1 r i L 


-1 解)」卜 c : 


where ii is the current through the inductor L and Vq is the 
voltage drop across the capacitor C. Find formulas for /l 
and Vc when R = .5 ohm, C = 2.5 farads, L = .5 henry, 
the initial current is 0 amp, and the initial voltage is 12 volts. 


署 

R 



L 


SOLUTIONS TO PRACTICE PROBLEMS 

1. Yes, the 3x3 matrix is diagonalizable because it has three distinct eigenvalues. 
Theorem 2 in Section 5.1 and Theorem 5 in Section 5.3 are valid when complex 
scalars are used. (The proofs are essentially the same as for real scalars.) 

2 . The general solution has the form 



r 


"1 + 2 / " 


"1 - 2 / " 

x(0 = Cl 

-2 

1 

e' 5! + c 2 

M 

2 

e^ 2+3i)t + c 3 

-4 / 

2 




The scalars C\, C2, C3 here can be any complex numbers. The first term inx(^) is real. 
Two more real solutions can be produced using the real and imaginary parts of the 
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second term in x(^): 


1 + 2 / 

M e. 2 ’(cos .3, + / sin .3f) 

2 

The general real solution has the following form, with real scalars C\, C 2 , cy. 


r 


cos 3t — 2 sin 3t 


sin 3t + 2 cos .3t 

-2 

1 

e~ 5t + c 2 

—4 sin 3t 

2 cos 3t 

e- 2 ' + c 3 

4 cos 3t 

2 sin .3^ 


3. Any solution with C 2 = C 3 = 0 is attracted to the origin because of the negative 
exponential factor. Other solutions have components that grow without bound, and 
the trajectories spiral outward. 

Be careful not to mistake this problem for one in Section 5.6. There the condition 
for attraction toward 0 was that an eigenvalue be less than 1 in magnitude, to make 
IAI 人 ^ 0. Here the real part of the eigenvalue must be negative, to make e Xt 0. 


5.8 ITERATIVE ESTIMATES FOR EIGENVALUES 


In scientific applications of linear algebra, eigenvalues are seldom known precisely. 
Fortunately, a close numerical approximation is usually quite satisfactory. In fact, some 
applications require only a rough approximation to the largest eigenvalue. The first 
algorithm described below can work well for this case. Also, it provides a foundation 
for a more powerful method that can give fast estimates for other eigenvalues as well. 


The Power Method 


The power method applies to an /1 x « matrix A with a strictly dominant eigenvalue 
Ai, which means that 入 1 must be larger in absolute value than all the other eigenvalues. 
In this case, the power method produces a scalar sequence that approaches X\ and a 
vector sequence that approaches a corresponding eigenvector. The background for the 
method rests on the eigenvector decomposition used at the beginning of Section 5.6. 

Assume for simplicity that A is diagonalizable and W 1 has a basis of eigenvectors 
Vi, ..., y„, arranged so their corresponding eigenvalues A 1 ， • •.，decrease in size, with 
the strictly dominant eigenvalue first. That is, 

|Ai| > |A 2 | > |A 3 | >•■•> |A„| 


Strictly larger 


⑴ 


As we saw in equation (2) of Section 5.6, if x in W 1 is written as x = ciVi + • • • + c n \ n , 
then 

A k x = ci(Ai)*vi + c 2 (A 2 ) a V 2 + •■• + c n (X„) k \ n {k = 1,2,...) 

Assume Ci # 0. Then, dividing by (久 1 ) 气 

(久户 = C\\\ + C2 (f) V2 + * * * + (f) (灸 =1,2, • • .) (2) 

From inequality (1), the fractions A 2 /A 1 ,..., X n /X\ are all less than 1 in magnitude and 
so their powers go to zero. Hence 

{X\)~ k A k x —> c\\\ as A: ^ 00 


( 3 ) 
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Thus, for large k, a scalar multiple of A k x determines almost the same direction as the 
eigenvector q Vi. Since positive scalar multiples do not change the direction of a vector, 
A k x itself points almost in the same direction as Vi or —Vi, provided c\ ^ 0. 


EXAMPLE 1 Let A 


Then A has 


1.8 .8 4 -.5 

.2 1.2」’ Vl = l_l」， andx= L 1 

eigenvalues 2 and 1， and the eigenspace for X\ = 2 is the line through 0 and Vi. For 
/: = 0,..., 8, compute A k x and construct the line through 0 and A k x. What happens as 
k increases? 


SOLUTION The first three calculations are 


Ax = 

A 2 x = A(Ax)= 
A 3 x = A(A 2 x)= 


"1.8 
.2 

.8" 

1.2_ 

- .5 

= 

"-.r 

i.i 

"1.8 

.8" 

"-.r 


.7" 

.2 

1.2_ 

_ i. 1 - 


1.3_ 

"1.8 

.8" 

.7" 


'2.3" 

.2 

1.2 

1.3 


1.7 


Analogous calculations complete Table 1. 


TABLE 1 Iterates of a Vector 


k 

012345 6 7 8 

A k x 


— .5 

1 


-.1 

1.1 


.7 

1.3 


2.3 

1.7 


5.5 

2.5 


11.9 

4.1 


24.7 

7.3 


50.3 

13.7 


101.5 

26.5 


The vectors x, Ax ,..., A 4 x are shown in Fig. 1. The other vectors are growing 
too long to display. However, line segments are drawn showing the directions of those 
vectors. In fact, the directions of the vectors are what we really want to see, not the vec¬ 
tors themselves. The lines seem to be approaching the line representing the eigenspace 
spanned by Vi. More precisely, the angle between the line (subspace) determined by 
A k x and the line (eigenspace) determined by Vi goes to zero sls k ^ oo. ■ 


x i 



FIGURE 1 Directions determined by x, Ax, A 2 x ,..., A 7 x. 


The vectors (X\)~ k A k x in (3) are scaled to make them converge to CiVi, provided 
C\ 7 ^ 0. We cannot scale A k x in this way because we do not know X\. But we can scale 
each A k x to make its largest entry a 1. It turns out that the resulting sequence {x^：} will 
converge to a multiple of Vi whose largest entry is 1. Figure 2 shows the scaled sequence 
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for Example 1. The eigenvalue X\ can be estimated from the sequence {x^}, too. When 
is close to an eigenvector for Ai, the vector Ax^ is close to Aix^, with each entry in 
Axk approximately A i times the corresponding entry in x^. Because the largest entry in 
is 1， the largest entry in Ax^ is close to Ai. (Careful proofs of these statements are 
omitted.) 

^2 



FIGURE 2 Scaled multiples of x, Ax, A 2 x ,. • • ， A 1 x. 


THE POWER METHOD FOR ESTIMATING A STRICTLY DOMINANT EIGENVALUE 

1. Select an initial vector xo whose largest entry is 1. 

2. For k = 0,1,..., 

a. Compute Ax^. 

b. Let fijc be an entry in Ax^ whose absolute value is as large as possible. 

c. Compute x/t+i = 

3. For almost all choices of xo, the sequence {/^k} approaches the dominant 
eigenvalue, and the sequence {x^} approaches a corresponding eigenvector. 


EXAMPLE 2 Apply the power method to A 


6 


with xo 


0 


Stop 


when k = 5, and estimate the dominant eigenvalue and a corresponding eigenvector 
of A. 

SOLUTION Calculations in this example and the next were made with MATLAB, 
which computes with 16-digit accuracy, although we show only a few significant figures 
here. To begin, compute Axq and identify the largest entry /xq in Axq ： 


Ax 0 


"6 5" 

"O' 


'5" 

1 2 

1 


2 


/^o 


Scale Axq by l//Xo to get xi, compute Ax\, and identify the largest entry in Ax \： 


xi 


-Axo 


Mo 


.4 


^Xi 


"6 

5" 

r 


_ 8" 

1 

2 

_.4_ 

= 

_1.8_ 


Mi 


Scale Ax\ by \j\i\ to get x〗，compute ^4x2, and identify the largest entry in ^4x2 ： 


x 2 


-Ax\ 




1.8 


^4x 2 


"6 

5" 

"1 " 


1 

2 

_ .225 _ 

― 


7.125 

1.450 


.225 

M2 


7.125 
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Scale ^ 4 x 2 by I//X 2 to get X 3 , and so on. The results of MATLAB calculations for the 
first five iterations are arranged in Table 2. 


TABLE 2 The Power Method for Example 2 


k 

0 

1 

2 

3 

4 

5 

Xk 


0 


1 


1 


1 


1 


1 


1 


.4 


.225 


.2035 


.2005 


■20007 

Ax k 


5 


8 


7.125 


7.0175 


7.0025 


7.00036 


2 


1.8 


1.450 


1.4070 


1.4010 


1.40014 


5 

8 

7.125 

7.0175 

7.0025 

7.00036 


The evidence from Table 2 strongly suggests that {x^} approaches (1, .2) and {fik} 
approaches 7. If so, then (1, .2) is an eigenvector and 7 is the dominant eigenvalue. This 
is easily verified by computing 


1 


6 5 

1 


7 


1 

.2 


1 2 

.2 


1.4 

一 / 

.2 


The sequence {fik} in Example 2 converged quickly to X\ =7 because the second 
eigenvalue of A was much smaller. (In fact, X 2 = 1.) In general, the rate of convergence 
depends on the ratio IA 2 /A 11 , because the vector 〔 2 (^ 2 / 久 1 )人 v 2 in equation ( 2 ) is the main 
source of error when using a scaled version of A k x as an estimate of C\\\. (The other 
fractions Ay/Ai are likely to be smaller.) If | 又 2 / 又 1 1 is close to 1, then {/ik} and {x 々 } 
can converge very slowly, and other approximation methods may be preferred. 

With the power method, there is a slight chance that the chosen initial vector x 
will have no component in the Vi direction (when c\ = 0). But computer rounding 
errors during the calculations of the are likely to create a vector with at least a small 
component in the direction of Vi. If that occurs, the will start to converge to a multiple 
of Vi. 


The Inverse Power Method 

This method provides an approximation for any eigenvalue, provided a good initial 
estimate a of the eigenvalue A is known. In this case, we let B = (A — al)~ l and apply 
the power method to B. It can be shown that if the eigenvalues of A are Ai,..., A„, then 
the eigenvalues of B are 



X\ — a.' X2 — a" ...， — a 


and the corresponding eigenvectors are the same as those for A. (See Exercises 15 and 
16.) 

Suppose, for example, that a is closer to 又 2 than to the other eigenvalues of A. 
Then 1/ (A 2 — oc) will be a strictly dominant eigenvalue of B. If a is really close to 又 2 , 
then 1/ (A 2 — a) is much larger than the other eigenvalues of B, and the inverse power 
method produces a very rapid approximation to A 2 for almost all choices of xo. The 
following algorithm gives the details. 
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THE INVERSE POWER METHOD FOR ESTIMATING AN EIGENVALUE A OF A 

1. Select an initial estimate a sufficiently close to A. 

2. Select an initial vector Xq whose largest entry is 1. 

3. For k = 0,1,..., 

a. Solve (A - al)y k = x k for y k . 

b. Let fik be an entry in y k whose absolute value is as large as possible. 

c. Compute = a (1/〜). 

d. Compute x k+l = i}/Pk 、 yk. 

4 . For almost all choices of x 。， the sequence {vjc} approaches the eigenvalue A 
of A, and the sequence {x^；} approaches a corresponding eigenvector. 


Notice that B, or rather (A — al)~ l , does not appear in the algorithm. Instead of 
computing (A — a/) -1 x^ to get the next vector in the sequence, it is better to solve 
the equation (^4 — cnl)y k = for y k (and then scale to produce x^_|_i). Since this 
equation for y k must be solved for each k, an LU factorization of A — al will speed up 
the process. 

EXAMPLE 3 It is not uncommon in some applications to need to know the smallest 
eigenvalue of a matrix A and to have at hand rough estimates of the eigenvalues. 
Suppose 21 ， 3.3, and 1.9 are estimates for the eigenvalues of the matrix A below. Find 
the smallest eigenvalue, accurate to six decimal places. 

"10 -8 -4 一 

A= -8 13 4 

_-4 5 4_ 

SOLUTION The two smallest eigenvalues seem close together, so we use the inverse 
power method for A — 1.91. Results of a MATLAB calculation are shown in Table 3. 
Here xo was chosen arbitrarily, y k = (A — l.9I)~ l xic, [ik is the largest entry in y k , 
Vk = 1.9 + 1//xa ：， andx 灸 +i = (l//x^ ： )y 灸 . As it turns out, the initial eigenvalue estimate 
was fairly good, and the inverse power sequence converged quickly. The smallest 
eigenvalue is exactly 2. ■ 


TABLE 3 The Inverse Power Method 


k 


Xk 


0 12 3 4 


1 


.5736 


.5054 


.5004 


•50003 

1 


.0646 


.0045 


.0003 


.00002 

1 


1 


1 


1 


1 


yk 


4.45 


5.0131 


5.0012 


5.0001 


5.000006 

.50 


.0442 


.0031 


.0002 


.000015 

7.76 


9.9197 


9.9949 


9.9996 


9.999975 


lik 7.76 9.9197 9.9949 9.9996 9.999975 

v k 2.03 2.0008 2.00005 2.000004 2.0000002 


If an estimate for the smallest eigenvalue of a matrix is not available, one can simply 
take a = 0 in the inverse power method. This choice of a works reasonably well if the 
smallest eigenvalue is much closer to zero than to the other eigenvalues. 
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The two algorithms presented in this section are practical tools for many simple 
situations, and they provide an introduction to the problem of eigenvalue estimation. A 
more robust and widely used iterative method is the QR algorithm. For instance, it is 
the heart of the MATLAB command eig ( A), which rapidly computes eigenvalues and 
eigenvectors of A. A brief description of the QR algorithm was given in the exercises 
for Section 5.2. Further details are presented in most modern numerical analysis texts. 


PRACTICE PROBLEM 

How can you tell if a given vector x is a good approximation to an eigenvector of a 
matrix A1 If it is, how would you estimate the corresponding eigenvalue? Experiment 
with 



■5 

8 

4" 


1.0" 

A = 

8 

3 

-1 

and x = 

-4.3 


4 

-1 

2 


8.1 


5.8 EXERCISES 


In Exercises 1-4, the matrix A is followed by a sequence {xk} 
produced by the power method. Use these data to estimate the 
largest eigenvalue of A, and give a corresponding eigenvector. 


6 . Let A 


-2 -3 
6 7 

ing sequence x, Ax, … ， A 5 x. 


.Repeat Exercise 5, using the foliow- 


A : 


4 


1 


-5 


-29 


— 125 


-509 


1 


13 

' 

61 


253 

' 

1021 

■ 


-2045 

4093 


1 


1 


1 


1 


0 

' 

.25 

■ 

•3158 

■ 

.3298 

■■ 


.3326 


2. A : 


1.8 


-.8 


-3.2 4.2 


_ r 


'-.5625' 


"-.3021' 


'-.2601' 


'-.2520" 

_ 0 _ 

' 

1 

' 

1 

* 

1 

' 

1 


[M] Exercises 7-12 require MATLAB or other computational aid. 
In Exercises 7 and 8 , use the power method with the Xq given. List 
{x^} and {fik} for k = 1,..., 5. In Exercises 9 and 10, list /X 5 and 
1 ^ 6 . 


7. A 


8 . A 


"6 

7' 


_ r 

_8 

5_ 

， Xq = 

_0_ 

"2 

r 


_ r 

4 

5 

， Xo = 

0 


3. A 


.2 

.7 


1 


1 


.6875 


.5577 


0 


.8 

■ 

1 

■ 

1 

’ 


.5188 



"8 0 

12" 


"1" 

9. A = 

1 -2 

1 

,x 0 = 

0 


0 3 

0 


0 





_1 2 -2' 


丁 

4. A = 

'4.1 -6" 

3 一 4.4 

10. A = 

1 1 9 

0 1 9 

， Xo = 

0 

0 


1 


1 


1 


1 


1 

' 

.7368 

' 

.7541 

' 

.7490 



.7502 


Another estimate can be made for an eigenvalue when an approx¬ 
imate eigenvector is available. Observe that if Ax = Ax, then 
x T Ax = x r (Ax) = A(x r x), and the Rayleigh quotient 


5. Let A 


31 

-41 


15 16 

-20 -21 


.The vectors x, … ， A 5 x i 


R(x) = 


x T Ax 



-191 


991 


-4991 


■ 

241 

■ 

-1241 

■ 

6241 

' 


24991 

-31241 


Find a vector with a 1 in the second entry that is close to 
an eigenvector of A. Use four decimal places. Check your 
estimate, and give an estimate for the dominant eigenvalue 
of A. 


equals X. If x is close to an eigenvector for A, then this quotient 
is close to A. When ^4 is a symmetric matrix (A T = A), the 
Rayleigh quotient R(x/c) = (xj^4x^)/(xjx^) will have roughly 
twice as many digits of accuracy as the scaling factor /x* in the 
power method. Verify this increased accuracy in Exercises 11 and 
12 by computing and R(X/c) for k = 1,..., 4. 
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11 . 



,x 0 : 


12 . 



,x 0 


Exercises 13 and 14 apply to a 3 x 3 matrix A whose eigenvalues 
are estimated to be 4, —4, and 3. 

13. If the eigenvalues close to 4 and —4 are known to have 
different absolute values, will the power method work? Is 
it likely to be useful? 


14. Suppose the eigenvalues close to 4 and —4 are known to have 
exactly the same absolute value. Describe how one might 
obtain a sequence that estimates the eigenvalue close to 4. 

15. Suppose Ax = Ax with x 一 0. Let a be a scalar different 

from the eigenvalues of A, and let B = (A — Sub¬ 

tract ax from both sides of the equation Ax = Ax, and use 
algebra to show that 1 / (A — a) is an eigenvalue of B , with x 
a corresponding eigenvector. 


16. Suppose fi is an eigenvalue of the B in Exercise 15, and that 
x is a corresponding eigenvector, so that (A — al)~ l x = fix. 
Use this equation to find an eigenvalue of A in terms of \x and 
a. [Note: /x 一 0 because B is invertible.] 

17. [M] Use the inverse power method to estimate the middle 
eigenvalue of the A in Example 3, with accuracy to four 
decimal places. Set Xo = (1,0,0). 


18. [M] Let A be as in Exercise 9. Use the inverse power 
method with Xq = (1,0,0) to estimate the eigenvalue of A 
near a = —1.4, with an accuracy to four decimal places. 


[M] In Exercises 19 and 20, find (a) the largest eigenvalue and (b) 
the eigenvalue closest to zero. In each case, set xq = (1,0,0,0) 
and carry out approximations until the approximating sequence 
seems accurate to four decimal places. Include the approximate 
eigenvector. 


19. A = 


20. A = 


10 7 

7 5 

8 6 

7 5 

1 2 
2 12 
-2 3 

4 5 


8 7 

6 5 

10 9 

9 10 

3 2" 

13 11 

0 2 

7 2 


21 . 


A common misconception is that if A has a strictly dominant 
eigenvalue, then, for any sufficiently large value of k, the 
vector A k x is approximately equal to an eigenvector of A. 
For the three matrices below, study what happens to A k x 


when x = (.5, .5), and try to draw general conclusions (for 
a 2 x 2 matrix). 



0 

.2 



0 

.8 



0 

2 


SOLUTION TO PRACTICE PROBLEM 


For the given A and x, 



"5 

8 

4 一 

"1.00" 


" 3.00" 

Ax = 

8 

3 

-1 

-4.30 

= 

-13.00 


4 

-1 

2 

8.10 


24.50 


If Ax is nearly a multiple of x, then the ratios of corresponding entries in the two vectors 
should be nearly constant. So compute: 

{entry in Ax} {entry in x} = {ratio} 

3.00 1.00 3.000 

-13.00 -4.30 3.023 

24.50 8.10 3.025 

Each entry in Ax is about 3 times the corresponding entry in x, so x is close to an 

- eigenvector. Any of the ratios above is an estimate for the eigenvalue. (To five decimal 

WEB places, the eigenvalue is 3.02409.) 
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CHAPTER 5 SUPPLEMENTARY EXERCISES 


Throughout these supplementary exercises, A and B represent 
square matrices of appropriate sizes. 

1. Mark each statement as True or False. Justify each answer. 

a. If A is invertible and 1 is an eigenvalue for A, then 1 is 
also an eigenvalue of A~ x . 

b. If A is row equivalent to the identity matrix I , then A is 
diagonalizable. 

c. If A contains a row or column of zeros, then 0 is an 
eigenvalue of A. 

d. Each eigenvalue of A is also an eigenvalue of A 2 . 

e. Each eigenvector of A is also an eigenvector of A 2 . 

f. Each eigenvector of an invertible matrix A is also an 
eigenvector of A~ l . 

g. Eigenvalues must be nonzero scalars. 

h. Eigenvectors must be nonzero vectors. 

i. Two eigenvectors corresponding to the same eigenvalue 
are always linearly dependent. 


x. If A is m n x n diagonalizable matrix, then each vector 
in R” can be written as a linear combination of eigenvec¬ 
tors of A. 

2. Show that if x is an eigenvector of the matrix product AB and 
Bx 0, then Bx is an eigenvector of BA. 

3. Suppose x is an eigenvector of A corresponding to an eigen¬ 
value X. 

a. Show that x is an eigenvector of 5/ — A. What is the 
corresponding eigenvalue? 

b. Show that x is an eigenvector of 51 — 3A + A 2 . What is 
the corresponding eigenvalue? 

4 . Use mathematical induction to show that if X is an eigenvalue 
of an « x n matrix A, with x a corresponding eigenvector, 
then, for each positive integer m, X m is an eigenvalue of A m , 
with x a corresponding eigenvector. 

5. If p(t) = c 0 + C\t + c 2 t 2 + ••• + c n t n , define p(A) to be 
the matrix formed by replacing each power of t in p(t) by 
the corresponding power of A (with A 0 = /). That is, 


j. Similar matrices always have exactly the same eigen¬ 
values. 

k. Similar matrices always have exactly the same eigen¬ 
vectors. 

l. The sum of two eigenvectors of a matrix A is also an 
eigenvector of A. 

m. The eigenvalues of an upper triangular matrix A are 
exactly the nonzero entries on the diagonal of A. 

n. The matrices A and A T have the same eigenvalues, 
counting multiplicities. 

o. If a 5 x 5 matrix A has fewer than 5 distinct eigenvalues, 
then A is not diagonalizable. 

p. There exists a 2 x 2 matrix that has no eigenvectors in 

R 2 . 

q. If A is diagonalizable, then the columns of A are linearly 
independent. 

r. A nonzero vector cannot correspond to two different 
eigenvalues of A. 


= cq I + C\A + C2^4 2 + … + c n A n 

Show that if A is an eigenvalue of A, then one eigenvalue of 
p{A) is p{X). 

r 2 o ~ 

6 . Suppose A = PDP~ l , where 尸 is 2 x 2 and D = ^ ^ . 

a. Let B = 51 — 3A A 2 . Show that B is diagonalizable 
by finding a suitable factorization of B . 

b. Given p(t) and p(A) as in Exercise 5, show that p(A) is 
diagonalizable. 

7. Suppose A is diagonalizable and p(t) is the characteristic 
polynomial of A. Define p(A) as in Exercise 5, and show 
that p(A) is the zero matrix. This fact, which is also true for 
any square matrix, is called the Cayley-Hamilton theorem. 

8 . a. Let ^4 be a diagonalizable n xji matrix. Show that if the 

multiplicity of an eigenvalue A is then A = XI. 

. 厂 3 1" 

b. Use part (a) to show that the matrix A = ^ ^ is not 

diagonalizable. 


s. A (square) matrix A is invertible if and only if there is a 
coordinate system in which the transformation x i-^- Ax 
is represented by a diagonal matrix. 

t. If each vector e y in the standard basis for R n is an 
eigenvector of A, then ^4 is a diagonal matrix. 

u. If A is similar to a diagonalizable matrix B, then A is 
also diagonalizable. 


9. Show that I — A is invertible when all the eigenvalues of A 
are less than 1 in magnitude. [Hint: What would be true if 
I — A were not invertible?] 

10. Show that if A is diagonalizable, with all eigenvalues less 
than 1 in magnitude, then A k tends to the zero matrix as 
k — oo. [Hint: Consider A k x where x represents any one 
of the columns of /.] 


v. If A and B are invertible n x n matrices, then AB is 
similar to BA. 

w. An n x n matrix with n linearly independent eigenvec¬ 
tors is invertible. 


11. Let u be an eigenvector of A corresponding to an eigenvalue 
A, and let H be the line in through u and the origin, 
a. Explain why H is invariant under A in the sense that Ax 
is in H whenever x is in H . 







Chapter 5 Supplementary Exercises 


327 


if possible. Use the eigenvalue command to create the diag¬ 
onal matrix D. If the program has a command that produces 
eigenvectors, use it to create an invertible matrix P. Then 
compute AP — PD and PDP~ l . Discuss your results. 


26. [M] Repeat Exercise 25 for A 


Use the results of Exercise 16 in the Supplementary Exercises 
for Chapter 3 to show that the eigenvalues of A are a — b and 
a {n — \)b. What are the multiplicities of these eigenval¬ 
ues? 

16. Apply the result of Exercise 15 to find the eigenvalues of the 

^7 3 3 3 3 ^ 


matrices 


and 


3 3 

3 3 

7 3 

3 7 


17. Let/ 


.Recall from Exercise 25 in Section 


0 ii a 12 
^21 ^22 

5.4 that tr A (the trace of A) is the sum of the diagonal entries 
in A. Show that the characteristic polynomial of A is 


X 2 — (tr A)X + det A 


Then show that the eigenvalues of a 2 x 2 matrix A are both 
7 tr^\ 2 

.4 -.3" 


real if and only if d&tA 


18. Let A 


— .5 
1.0 


.4 

-.75 

1.50 


1.2 
as k- 


Explain why A k approaches 


Exercises 19-23 concern the polynomial 


pit) = ciq ci\t + … + a n —\t n 1 + t n 


(The transpose of V was considered in Supplementary Exer¬ 
cise 11 in Chapter 2.) Use Exercise 22 and a theorem from 
this chapter to deduce that V is invertible (but do not compute 
V~ l ). Then explain why V~ l C p V is a diagonal matrix. 

24. [M] The MATLAB command roots(p) computes the 
roots of the polynomial equation p(t) = 0. Read a MATLAB 
manual, and then describe the basic idea behind the algorithm 
for the roots command. 

25. [M] Use a matrix program to diagonalize 

_-3 -2 0" 

A = 14 7-1 

-6 -3 1 


and ann x n matrix C p called the companion matrix of p: 


c p 


—ao —a\ —a 2 


19. Write the companion matrix C p for p{t) = 6 — 5t 1 2 , and 
then find the characteristic polynomial of C p . 

20. Let p(t) = (t- 2)0 — 3)( / - 4) = -24 + 26t - 9t 2 + t 3 . 
Write the companion matrix for p(t), and use techniques 
from Chapter 3 to find its characteristic polynomial. 

21. Use mathematical induction to prove that for n >2, 
det(Cp — A/) = (—l)”(ao + “1 久 + … + ct n —\X n 1 + A n ) 

=(-1)>(A) 

[Hint: Expanding by cofactors down the first column, show 
that det {C p — XI) has the form {—X)B + (—l) n ao, where B 
is a certain polynomial (by the induction assumption).] 

22. Let p(t) = a 0 + a\t + a 2 t 2 + t 3 , and let A be a zero of p. 

a. Write the companion matrix for p. 

b. Explain why A 3 = —ao — a\X — a 2 ^ 2 , and show that 
(1, A, A 2 ) is an eigenvector of the companion matrix for 
P. 

23. Let p be the polynomial in Exercise 22, and suppose the 
equation p(t) = 0 has distinct roots Ai, 久 2 ，久 3 . Let V be 
the Vandermonde matrix 


A = 


15. Let J be the n y. n matrix of all Is, and consider 
A = (a — b)I + bJ\ that is, 


b. Let a be a one-dimensional subspace of M that is invari¬ 
ant under A. Explain why K contains an eigenvector of 
A. 


12. Let G 


X 

B 


Use formula (1) for the determinant 


in Section 5.2 to explain why det G = (det ^4)(det B). From 
this, deduce that the characteristic polynomial of G is the 
product of the characteristic polynomials of A and B. 

Use Exercise 12 to find the eigenvalues of the matrices in Exer¬ 
cises 13 and 14. 


▽abb •… b 
b a b • • • b 

b b a • • • b 


b b b • • • a 


0 2 3 0 


11 11 


5 2 8 2 


8 5 0 

_ II 


2 2 1 

2 12 

12 2 


11 3 2 3 
1 - A2 t 2 a 2 
1a1a?1 




2 5 4 5 4 0 0 


3 0 0 12 0 0 


3 . 


y4 

14 . 
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Orthogonality and 
Least Squares 


INTRODUCTORY EXAMPLE 

The North American Datum 
and GPS Navigation 

Imagine starting a massive project that you estimate will 
take ten years and require the efforts of scores of people 
to construct and solve a 1,800,000 by 900,000 system 
of linear equations. That is exactly what the National 
Geodetic Survey did in 1974, when it set out to update 
the North American Datum (NAD)—a network of 268,000 
precisely located reference points that span the entire North 
American continent, together with Greenland, Hawaii, the 
Virgin Islands, Puerto Rico, and other Caribbean islands. 

The recorded latitudes and longitudes in the NAD 
must be determined to within a few centimeters because 
they form the basis for all surveys, maps, legal property 
boundaries, and layouts of civil engineering projects 
such as highways and public utility lines. However, 
more than 200,000 new points had been added to the 
datum since the last adjustment in 1927, and errors had 
gradually accumulated over the years, due to imprecise 
measurements and shifts in the earth’s crust. Data 
gathering for the NAD readjustment was completed in 
1983. 

The system of equations for the NAD had no solution 
in the ordinary sense, but rather had a least-squares 
solution, which assigned latitudes and longitudes to the 
reference points in a way that corresponded best to the 1.8 
million observations. The least-squares solution was found 
in 1986 by solving a related system of so-called 


normal equations ， which involved 928,735 equations in 
928,735 variables. 1 

More recently, knowledge of reference points on the 
ground has become crucial for accurately determining 
the locations of satellites in the satellite-based Global 
Positioning System (GPS). A GPS satellite calculates its 
position relative to the earth by measuring the time it takes 
for signals to arrive from three ground transmitters. To do 
this, the satellites use precise atomic clocks that have been 
synchronized with ground stations (whose locations are 
known accurately because of the NAD). 

The Global Positioning System is used both for 
determining the locations of new reference points on the 
ground and for finding a user’s position on the ground 
relative to established maps. When a car driver (or a 
mountain climber) turns on a GPS receiver, the receiver 
measures the relative arrival times of signals from at 
least three satellites. This information, together with the 
transmitted data about the satellites’ locations and message 
times, is used to adjust the GPS receiver’s time and to 
determine its approximate location on the earth. Given 
information from a fourth satellite, the GPS receiver can 
even establish its approximate altitude. 

1 A mathematical discussion of the solution strategy (along with details 
of the entire NAD project) appears in North American Datum of 1983, 
Charles R. Schwarz (ed.), National Geodetic Survey, National Oceanic 
and Atmospheric Administration (NO A A) Professional Paper NOS 2, 
1989. 
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Both the NAD and GPS problems are solved by finding 
a vector that “approximately satisfies” an inconsistent 


system of equations. A careful explanation of this apparent 
contradiction will require ideas developed in the first five 
sections of this chapter. 

WEB 


In order to find an approximate solution to an inconsistent system of equations that has 
no actual solution, a well-defined notion of nearness is needed. Section 6.1 introduces 
the concepts of distance and orthogonality in a vector space. Sections 6.2 and 6.3 show 
how orthogonality can be used to identify the point within a subspace W that is nearest 
to a point y lying outside of W. By taking W to be the column space of a matrix, 
Section 6.5 develops a method for producing approximate (“least-squares”）solutions 
for inconsistent linear systems, such as the system solved for the NAD report. 

Section 6.4 provides another opportunity to see orthogonal projections at work, 
creating a matrix factorization widely used in numerical linear algebra. The remaining 
sections examine some of the many least-squares problems that arise in applications, 
including those in vector spaces more general than W 1 . 

6.1 INNER PRODUCT, LENGTH, AND ORTHOGONALITY 

Geometric concepts of length, distance, and perpendicularity, which are well known for 
R 2 and R 3 , are defined here for R n . These concepts provide powerful geometric tools 
for solving many applied problems, including the least-squares problems mentioned 
above. All three notions are defined in terms of the inner product of two vectors. 


The Inner Product 

If u and v are vectors in W 1 , then we regard u and v as /z x 1 matrices. The transpose 
u r is a 1 x n matrix, and the matrix product u r v is a 1 x 1 matrix, which we write as 
a single real number (a scalar) without brackets. The number u r y is called the inner 
product of u and y, and often it is written as u*v. This inner product, mentioned in the 
exercises for Section 2.1，is also referred to as a dot product. If 



U\ 



u = 

u 2 

and y = 

V2 


_ 


_v n ■ 


then the inner product of u and y is 


V2 

. = U\V\ + u 2 v 2 H - h U n v n 

V n 


[u\ u 2 ••- U n ] 
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THEOREM 1 


DEFINITION 


x i 


(a, b) 



FIGURE 1 

Interpretation of ||y|| as length. 


2 

EXAMPLE 1 Compute u*y and y-u for u = —5 

-1 

SOLUTION 

3" 

u .y = u r v = [2 -5-1] 2 = (2)(3) + (-5)(2) + (-1)(-3) = -1 

-3 


and y = 

3 

2 


-3 


2 

vu = v r u=[3 2 -3] -5 = (3)(2) + (2)(-5) + (-3)(-1) = -1 


■ 


It is clear from the calculations in Example 1 why u*v = v-u. This commutativity 
of the inner product holds in general. The following properties of the inner product 
are easily deduced from properties of the transpose operation in Section 2.1. (See 
Exercises 21 and 22 at the end of this section.) 


Let u, y, and w be vectors in W 1 , and let c be a scalar. Then 

a. u-v = v.u 

b. (u + v)-w = u-w + v-w 

c. (cu)-y = c(u-v) = U'(cv) 

d. u-u > 0, and u-u = 0 if and only if u = 0 


Properties (b) and (c) can be combined several times to produce the following useful 

rule: 


(ciui H - h c p \x p )-vf = ci(ui-w) H - h c*^(Up-w) 


The Length of a Vector 

If v is in W 1 , with entries ... ,v n , then the square root of v. v is defined because y-y 

is nonnegative. 

The length (or norm) of v is the nonnegative scalar ||y|| defined by 

||v|| = Vv^v = y]v\-\-vl-\ - h and ||v|| 2 = v-v 


Suppose y is in R 2 , say, y = 


If we identify y with a geometric point in the 


plane, as usual, then ||y|| coincides with the standard notion of the length of the line 
segment from the origin to v. This follows from the Pythagorean Theorem applied to a 


triangle such as the one in Fig. 1. 

A similar calculation with the diagonal of a rectangular box shows that the definition 
of length of a vector v in R 3 coincides with the usual notion of length. 

For any scalar c, the length of c\ is \c \ times the length of v. That is, 


IM| = |c|||v|| 

(To see this, compute ||cv|| 2 = (cv)- (cv) = c 2 \»\ = c 2 ||v|| 2 and take square roots.) 
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A vector whose length is 1 is called a unit vector. If we divide a nonzero vector v 
by its length—that is, multiply by l/||v|| —we obtain a unit vector u because the length 
of u is (l/||y||)||y||. The process of creating u from y is sometimes called normalizing 
y, and we say that u is in the same direction as y. 

Several examples that follow use the space-saving notation for (column) vectors. 


EXAMPLE 2 Let y = (1, —2,2,0). Find a unit vector u in the same direction as y. 


SOLUTION First, compute the length of y ： 

||v|| 2 =y.y = (l) 2 + (- 2) 2 + (2) 2 + (0) 2 = 9 
||y|| = \/9 = 3 


Then, multiply y by l/||y|| to obtain 



r 


1/3" 

1 1 1 

-2 


-2/3 

u = - y = v = — 

x 2 l|v|| 3 3 

2 


2/3 

W/ 

0 


0 


To check that ||u|| = 1， it suffices to show that ||u|| 2 = 1. 

||u|| 2 = u.u=(I) 2 + (-f) 2 + (f) 2 + (0) 2 

=i+!+g+o=i ■ 

EXAMPLE 3 Let W be the subspace of R 2 spanned by x = (|, 1). Find a unit 
vector z that is a basis for W. 



SOLUTION W consists of all multiples of x, as in Fig. 2(a). Any nonzero vector in W 
is a basis for W. To simplify the calculation, “scale” x to eliminate fractions. That is, 
multiply x by 3 to get 

" 2 " 
y = ^ 


Now compute ||y|| 2 = 2 2 + 3 2 = 13, ||y|| = ^13, and normalize y to get 


(b) 

1 

"2' 


[2/713] 

FIGURE 2 

z yi3 

_3_ 


L3/VI3J 


Normalizing a vector to produce a 

unit vector. See Fig. 2(b). Another unit vector is (—2/\/T5, —3/VT3). ■ 


Distance in 

We are ready now to describe how close one vector is to another. Recall that if a and b 
are real numbers, the distance on the number line between a and b is the number \a — b\. 
Two examples are shown in Fig. 3. This definition of distance in M has a direct analogue 
in R n . 


a b 

H~~I~~I~~I~~I~~I~~I~~I~~h 

123456789 
6 units apart 


12-81 = 1-61 = 6 or 18-21 = 161 = 6 
FIGURE 3 Distances in R. 


a b 

H~~I~~I~~I~~I~~I~~I~~I~~h 

-3 -2 -1 0 1 2 3 4 5 

. 7 units apart 


|(_3)-41 = 1-71 = 7 or 14-(-3)1 = 171 = 7 
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DEFINITION For u and v in R 72 , the distance between u and y, written as dist(u,y), is the 
length of the vector u — v. That is, 

dist(u, v) = ||u — v|| 


In R 2 and R 3 , this definition of distance coincides with the usual formulas for the 
Euclidean distance between two points, as the next two examples show. 

EXAMPLE 4 Compute the distance between the vectors u = (7,1) and v = (3,2). 
SOLUTION Calculate 


7 


3 


4 

1 


2 


-1 


||u-v|| = V4 2 + (-1)2 = 717 

The vectors u, y, and u — v are shown in Fig. 4. When the vector u — v is added 
to y, the result is u. Notice that the parallelogram in Fig. 4 shows that the distance from 
u to v is the same as the distance from u — y to 0. ■ 



FIGURE 4 The distance between u and v is 
the length of u — y. 


EXAMPLE 5 If u = (wi, W2, W3) and v = (i ； i, i ； 2, ”3)，then 
dist(u, v) = ||u — v|| = ^ (u — v) • (u — v) 

= yj(Mi - v \) 2 + (u 2 - v 2 ) 2 + (u 3 - v 3 ) 2 ■ 



llu- (- v)|| 


Orthogonal Vectors 

The rest of this chapter depends on the fact that the concept of perpendicular lines in 
ordinary Euclidean geometry has an analogue in R n . 

Consider R 2 or R 3 and two lines through the origin determined by vectors u and v. 
The two lines shown in Fig. 5 are geometrically perpendicular if and only if the distance 
from u to v is the same as the distance from u to — y. This is the same as requiring the 
squares of the distances to be the same. Now 

[dist(u,-y)] 2 = ||u-(-v)|| 2 = ||u + v|| 2 
=(u + y) • (u + y) 

=u. (u + v) + v. (u + v) 

=u.u + u*v + y-u + v-y Theorem 1(a), (b) 

=||u|| 2 + ||v|| 2 + 2u-v Theorem 1(a) (1) 
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DEFINITION 


THEOREM 2 


U + V 





FIGURE 7 

A plane and line through 0 as 
orthogonal complements. 


The same calculations with v and —v interchanged show that 

[dist(u,v)] 1 2 = ||u|| 2 +||-y|| 2 + 2u.(-y) 

=||u|| 2 +||v|| 2 -2u.v 

The two squared distances are equal if and only if 2u*y = —2u*v, which happens if and 
only if u-y = 0. 

This calculation shows that when vectors u and v are identified with geometric 
points, the corresponding lines through the points and the origin are perpendicular if 
and only if u-v = 0. The following definition generalizes to W 1 this notion of perpen¬ 
dicularity (or orthogonality, as it is commonly called in linear algebra). 


Two vectors u and v in are orthogonal (to each other) if u-v = 0. 

Observe that the zero vector is orthogonal to every vector in W 1 because 0 r v = 0 
for all v. 

The next theorem provides a useful fact about orthogonal vectors. The proof fol¬ 
lows immediately from the calculation in (1) above and the definition of orthogonality. 
The right triangle shown in Fig. 6 provides a visualization of the lengths that appear in 
the theorem. 


The Pythagorean Theorem 

Two vectors u and y are orthogonal if and only if ||u + y|| 2 = ||u|| 2 + ||y|| 2 . 


Orthogonal Complements 

To provide practice using inner products, we introduce a concept here that will be of use 
in Section 6.3 and elsewhere in the chapter. If a vector z is orthogonal to every vector 
in a subspace W of then z is said to be orthogonal to W. The set of all vectors z 
that are orthogonal to W is called the orthogonal complement of W and is denoted by 
W 丄 （and read as i6 W perpendicular” or simply U W perp”). 

EXAMPLE 6 Let ^ be a plane through the origin in R 3 , and let L be the line 
through the origin and perpendicular to W. If z and w are nonzero, z is on L, and 
w is in W, then the line segment from 0 to z is perpendicular to the line segment from 0 
to w; that is, z*w = 0. See Fig. 7. So each vector on L is orthogonal to every w in W. 
In fact, L consists of all vectors that are orthogonal to the w’s in W, and W consists of 
all vectors orthogonal to the z’s in L. That is, 

L = W l and W = ■ 


The following two facts about with W a subspace of W 1 , are needed later 
in the chapter. Proofs are suggested in Exercises 29 and 30. Exercises 27-31 provide 
excellent practice using properties of the inner product. 

1. A vector x is in if and only if x is orthogonal to every vector in a set that 
spans W. 

2. W 1 - is a subspace of . 
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THEOREM 3 


The next theorem and Exercise 31 verify the claims made in Section 4.6 concerning 
the subspaces shown in Fig. 8. (Also see Exercise 28 in Section 4.6.) 


A 

- ^ 





FIGURE 8 The fundamental subspaces determined 
by an m x « matrix A. 


Let A be an m x n matrix. The orthogonal complement of the row space of A is 
the null space of A, and the orthogonal complement of the column space of A is 
the null space of A r : 

(Row A) 1 - = Nul A and (Col A) 1 - = Nul A T 


PROOF The row-column rule for computing Ax shows that if x is in Nul^4, then x is 
orthogonal to each row of A (with the rows treated as vectors in R w ). Since the rows 
of A span the row space, x is orthogonal to Row A. Conversely, if x is orthogonal to 
Row A, then x is certainly orthogonal to each row of A, and hence Ax = 0. This proves 
the first statement of the theorem. Since this statement is true for any matrix, it is true 
for A T . That is, the orthogonal complement of the row space of A T is the null space of 
A T . This proves the second statement, because Row A T = Col A. 


Angles in R 2 and R 3 (Optional) 

If u and v are nonzero vectors in either R 2 or R 3 , then there is a nice connection between 
their inner product and the angle 办 between the two line segments from the origin to the 
points identified with u and y. The formula is 

u-y = ||u|| ||v|| cos?? (2) 

To verify this formula for vectors in R 2 , consider the triangle shown in Fig. 9, with sides 
of lengths ||u||, ||v||, and ||u — v||. By the law of cosines, 

||u-v || 2 = ||u|| 2 + ||v|| 2 -2||u|| ||v||cos^ 



FIGURE 9 The angle between two vectors. 
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which can be rearranged to produce 

Hull Hv||cos^= ^[I|u|| 2 +||y|| 2 -||u-y|| 2 ] 

= 2 [ W 1 + ^2 + + 1^2 — ( W 1 — ^ l) 2 — ( u 2 — ” 2 ) 2 ] 

= U\V\ + U 2 v 2 

= u-v 

The verification for R 3 is similar. When n > 3, formula (2) may be used to define the 
angle between two vectors in W 1 . In statistics, for instance, the value of cos 办 defined 
by (2) for suitable vectors u and v is what statisticians call a correlation coefficient. 


PRACTICE PROBLEMS 


Let a 


~-2~ 


'-3' 


"4/3~ 


5" 

，b = 

i 

,c = 

-1 

,and d = 

6 

i 


丄 


2/3 


-1 


a*b 

1. Compute - and 

a.a 



a. 


2. Find a unit vector u in the direction of c. 

3. Show that d is orthogonal to c. 


4. Use the results of Practice Problems 2 and 3 to explain why d must be orthogonal to 
the unit vector u. 


6.1 EXERCISES 


Compute the quantities in Exercises 1-8 using the vectors 






3~ 


6" 

"-1' 


'4' 




2 

,v = 

6 

,w = 

-1 

_-5_ 

,x = 

-2 






3 


1. u«u, y-u, and 


2. w«w, x-w, and 


1 

3. - w 

w*w 


4. 


- u 

u-u 


5. 



6 . 



x 


7. IHI 8. ||x|| 

In Exercises 9-12, find a unit vector in the direction of the given 
vector. 


9. 


-30 

40 


—6 

10. 4 

-3 


11 . 


7/4 

1/2 


12 . 


8/3 

2 


13. 


Find the distance between x = 


10 

-3 


and y 




0" 


"-4" 

14. Find the distance between u = 

-5 

and z = 

-1 


_ 2_ 


8 


Determine which pairs of vectors in Exercises 15-18 are orthog¬ 
onal. 



17. 


3' 


"-4" 

2 


1 

-5 

,v — 

-2 

0 


6 



In Exercises 19 and 20, all vectors are in V. Mark each statement 
True or False. Justify each answer. 

19. a. y y= ||y|| 2 . 

b. For any scalar c ， u. (cv) = c(u*v). 

c. If the distance from u to v equals the distance from u to 
—v, then u and y are orthogonal. 

d. For a square matrix A, vectors in Col A are orthogonal to 
vectors in Nul A. 

e. If vectors Vi,...,span a subspace W and if x is 

orthogonal to each for j = then x is in W^-. 
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20. a. u.v — y-u = 0 . 


b. For any scalar c, ||cv|| = c||v||. 

c. If x is orthogonal to every vector in a subspace W, then x 
is in 

d. If ||u|| 2 + ||y || 2 = ||u + y|| 2 , then u and y are orthogonal. 

e. For an m x n matrix A, vectors in the null space of A are 
orthogonal to vectors in the row space of A. 


21 . Use the transpose definition of the inner product to verify 
parts (b) and (c) of Theorem 1. Mention the appropriate facts 
from Chapter 2. 


22. Let u = (Mi, i/ 2 , W 3 ). Explain why u*u > 0. When is 

u-u = 0? 



2 " 


"-7" 

23. Let u = 

-5 

and v = 

-4 


-1 


6 


Compute and compare 


u.v, ||u|| 2 , ||y|| 2 , and ||u + v|| 2 . Do not use the Pythagorean 
Theorem. 


24. Verify the parallelogram law for vectors u and y in R n : 
||u + y|| 2 +||u-v|| 2 = 2||u|| 2 + 2||v|| 2 


25. Let y : 


.Describe the set H of vectors 




orthogonal to y. [Hint: Consider y = 0 and y ^ 0. 


that are 


26. Let u 


and let W be the set of all x in R 3 such that 


u • x = 0. What theorem in Chapter 4 can be used to show that 
VK is a subspace of R 3 ? Describe W in geometric language. 

27. Suppose a vector y is orthogonal to vectors u and y. Show 
that y is orthogonal to the vector u + y. 


28. Suppose y is orthogonal to u and y. Show that y is or¬ 
thogonal to every w in Span {u, v}. [Hint: An arbitrary w 
in Span {u, v} has the form w = CiU + QV. Show that y is 
orthogonal to such a vector w.] 



29. Let W = Span {vi,..., v p }. Show that if x is orthogonal to 
each \j, for I < j < p, then x is orthogonal to every vector 
in W. 


30. Let W be a. subspace of R”，and let W ± be the set of all 
vectors orthogonal to W. Show that W 1 - is a subspace of W 1 
using the following steps. 

a. Take z in W^, and let u represent any element of W. 
Then z-u = 0. Take any scalar c and show that cz is 
orthogonal to u. (Since u was an arbitrary element of W, 
this will show that cz is in .) 

b. Take Zi and Z 2 in and let u be any element of W. 
Show that zi + Z 2 is orthogonal to u. What can you 
conclude about Z\ + Z 2 ? Why? 

c. Finish the proof that W 1 - is a subspace of R n . 

31. Show that if x is in both W and W ± , then x = 0. 

32. [M] Construct a pair u, v of random vectors in M 4 , and let 


. .5 .5 — .5 — .5 

=.5 -.5 .5 -.5 

•5 — .5 — .5 .5 

a. Denote the columns of A by aj,..., 84 . Com¬ 
pute the length of each column, and compute ai *a 2 , 
ai * 33 , ai * 34 ,32*^3,32*^4? and 33 * 34 . 

b. Compute and compare the lengths of u, Au, y, and A\. 

c. Use equation ( 2 ) in this section to compute the cosine of 
the angle between u and y. Compare this with the cosine 
of the angle between Au and A\. 

d. Repeat parts (b) and (c) for two other pairs of random 
vectors. What do you conjecture about the effect of A on 
vectors? 

33. [M] Generate random vectors x, y, and y in R 4 with integer 

entries (and v _ 0 ), and compute the quantities 


/x»v\ /y-v\ (x + y)»v 

\y.y/ V ， \y.y/ V， y.y V， 


( 10 x).y 

- v 

v-y 


Repeat the computations with new random vectors x and 
y. What do you conjecture about the mapping x 7" (x)= 


v (for v _ 0)? Verify your conjecture algebraically. 


34. 


[M] Let A = 


-6 

3 

-27 

-33 

-13 

6 

-5 

25 

28 

14 

8 

-6 

34 

38 

18 

12 

—10 

50 

41 

23 

14 

-21 

49 

29 

33 


a matrix N whose columns form a basis for Nul A, and 
construct a matrix R whose rows form a basis for Row A (see 
Section 4.6 for details). Perform a matrix computation with 


N and R that illustrates a fact from Theorem 3. 


















338 CHAPTER 6 Orthogonality and Least Squares 


SOLUTIONS TO PRACTICE PROBLEMS 


a.b 7 / a 

1. a*b = 7, a.a = 5. Hence - = 一， and 

a.a 5 \ a 


二) 


7 

a = -a 


-14/5 

7/5 


2. Scale c, multiplying by 3 to get y 


.Compute ||y || 2 = 29 and ||y|| = \/29. 


The unit vector in the direction of both c and y is u 
3. d is orthogonal to c, because 


llyir 


4/V29' 

-3/V29 

2/V29. 



5" 


"4/3" 

d.c = 

6 

-1 

• 

-1 

2/3 


20 2 
—— 6 _ 一 


0 


4. d is orthogonal to u because u has the form kc for some k, and 
d*u = d* (kc) = A:(d-c) = ^(0) = 0 


6.2 ORTHOGONAL SETS 


A set of vectors {ui,..., u^} in is said to be an orthogonal set if each pair of distinct 
vectors from the set is orthogonal, that is, if u z - *u y =0 whenever i ^ j. 


义 3 



FIGURE 1 


EXAMPLE 1 Show that {ui,U 2 ,U 3 } is an orthogonal set, where 


Ui = 

"3" 

1 

,U 2 = 

"- 1 " 

2 

,U 3 = 

"-1/2" 

-2 


1 


1 


7/2 


SOLUTION Consider the three possible pairs of distinct vectors, namely, {111,112 }， 
{ui,u 3 }, and {u 2 ,u 3 }. 

Ul -u 2 = 3(-1)+1(2) +1(1) = 0 
1^.113 = 3 (4)+ 1(-2) + 1( 圣 ）= 0 

U2.U3 = —1 ( _ 全 ） + 2(—2) + 1 (!) = 0 

Each pair of distinct vectors is orthogonal, and so {ui ， 112 , 113 } is an orthogonal set. See 
Fig. 1; the three line segments there are mutually perpendicular. ■ 


THEOREM 4 If S = {ui,..., u p } is an orthogonal set of nonzero vectors in W 1 , then S is 
linearly independent and hence is a basis for the subspace spanned by S. 


PROOF If 0 = C 1 U 1 + • • • + c p u p for some scalars Ci,..., c p , then 

0 = 0-Ui = (CiUi + c 2 u 2 H - h C p U p )-U! 

= (ciUi)-ui + (c 2 u 2 )*ui H - + {c p u p )-\xi 

=Ci(u r Ui) + C 2 (u 2 - Ui) H - + c p {u p -ux) 

=Ci(Ui-Ui) 

because Ui is orthogonal to 112 ,… ,u p . Since Ui is nonzero, Ui «ui is not zero and so 
c\ = 0. Similarly, C 2 , … ， c p must be zero. Thus S is linearly independent. ■ 
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DEFINITION 


THEOREM 5 


An orthogonal basis for a subspace W of W 1 is a basis for W that is also an 
orthogonal set. 


The next theorem suggests why an orthogonal basis is much nicer than other bases. 
The weights in a linear combination can be computed easily. 


Let {ui ，… ，uj be an orthogonal basis for a subspace W of W 1 . For each y in 
W, the weights in the linear combination 


are given by 


C J 


y = c x ui + - 

= yu; 

— u r u 7 


•• + C p U p 

(j = l,..., p) 


PROOF As in the preceding proof, the orthogonality of {ui,..., u^} shows that 

y-ui = (ciui + c 2 u 2 H - + c p u p )-ui = ci(u r Ui) 

Since Ui.Ui is not zero, the equation above can be solved for c\. To find Cj for 
j = 2,… ， p ，compute y*u 7 and solve for Cj. ■ 


EXAMPLE 2 The set S = {ui, U 2 , U 3 } in Example 1 is an orthogonal basis for R 3 . 

6 " 


Express the vector y = 


1 as a linear combination of the vectors in S. 
-8 


SOLUTION Compute 


y-ui = 11, y-u 2 = -12, y-u 3 = -33 
Ui* Ui = 11, U2* U2 = 6, 113.113 = 33/2 


By Theorem 5, 


y = 



yu 2 


U 2 -U 2 


U 2 + 


yu 3 


113.113 


u 3 


11 -12 -33 

=n Ul + j U2 + 33A U3 

=Ui — 2U2 — 2U3 


■ 


Notice how easy it is to compute the weights needed to build y from an orthogonal 
basis. If the basis were not orthogonal, it would be necessary to solve a system of linear 
equations in order to find the weights, as in Chapter 1. 

We turn next to a construction that will become a key step in many calculations 
involving orthogonality, and it will lead to a geometric interpretation of Theorem 5. 


An Orthogonal Projection 

Given a nonzero vector u in , consider the problem of decomposing a vector y in 

into the sum of two vectors, one a multiple of u and the other orthogonal to u. We wish 
to write 


y = y + z 


⑴ 
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z=y-y y 



FIGURE 2 

Finding a to make y — y 
orthogonal to u. 


where y = an for some scalar a and z is some vector orthogonal to u. See Fig. 2. Given 
any scalar a, let z = y — cm, so that (1) is satisfied. Then y — y is orthogonal to u if and 
only if 

0 = (y — cm).u = y.u _ (crn).u = y.u _ a(u.u) 

y.u 八 y.u 

That is, (1) is satisfied with z orthogonal to u if and only it a = - and y = - u. 

^ u.u u.u 

The vector y is called the orthogonal projection of y onto u, and the vector z is called 

the component of y orthogonal to u. 

If c is any nonzero scalar and if u is replaced by cu in the definition of y, then the 
orthogonal projection of y onto cu is exactly the same as the orthogonal projection of y 
onto u (Exercise 31). Hence this projection is determined by the subspace L spanned 
by u (the line through u and 0). Sometimes y is denoted by proj L y and is called the 

orthogonal projection of y onto L. That is, 


卜 — y= ^ U (2) 


EXAMPLE 3 



"7' 


"4" 

Let y = 

6 

and u = 

2 


Find the orthogonal projection of y 


onto u. Then write y as the sum of two orthogonal vectors, one in Span {u} and one 
orthogonal to u. 


SOLUTION Compute 



"7" 


"4" 

y.u = 

6_ 


2 


"4" 


"4" 

u*u = 

2 


2 


= 40 

= 20 


The orthogonal projection of y onto u is 


.y.u 40 


'4' 


"8" 

y = - u = —— u = 

u.u 20 

= 2 

2 

= 

4 


and the component of y orthogonal to u is 



7 


8 


-1 

y-y = 

6 


4 

= 

2 


The sum of these two vectors is y. That is, 


6 4 ^ 2 


y y (y-y) 

This decomposition of y is illustrated in Fig. 3. Note: If the calculations above are 
correct, then {y, y — y} will be an orthogonal set. As a check, compute 


y-(y-y)= 



=—8 + 8 = 0 


■ 


Since the line segment in Fig. 3 between y and y is perpendicular to L, by construc¬ 
tion of y, the point identified with y is the closest point of L to y. (This can be proved 
from geometry. We will assume this for R 2 now and prove it for W 1 in Section 6.3.) 
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FIGURE 3 The orthogonal projection of y onto a 
line L through the origin. 


EXAMPLE 4 Find the distance in Fig. 3 from y to L. 

SOLUTION The distance from y to L is the length of the perpendicular line segment 
from y to the orthogonal projection y. This length equals the length of y — y. Thus the 
distance is 

lly-yll = V(-i) 2 + 2 2 = V5 ■ 


A Geometric Interpretation of Theorem 5 


The formula for the orthogonal projection y in (2) has the same appearance as each of the 
terms in Theorem 5. Thus Theorem 5 decomposes a vector y into a sum of orthogonal 
projections onto one-dimensional subspaces. 

It is easy to visualize the case in which ^ = R 2 = Span{ui,U 2 }，with ui and U 2 
orthogonal. Any y in R 2 can be written in the form 


y = 



y.u 2 
- u 2 

U 2 -U 2 


(3) 


The first term in (3) is the projection of y onto the subspace spanned by Ui (the line 
through ui and the origin), and the second term is the projection of y onto the subspace 
spanned by U 2 . Thus (3) expresses y as the sum of its projections onto the (orthogonal) 
axes determined by ui and 112 . See Fig. 4. 



FIGURE 4 A vector decomposed into 
the sum of two projections. 


Theorem 5 decomposes each y in Span {ui, … ， u^} into the sum of p projections 
onto one-dimensional subspaces that are mutually orthogonal. 
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Decomposing a Force into Component Forces 

The decomposition in Fig. 4 can occur in physics when some sort of force is applied to an 
object. Choosing an appropriate coordinate system allows the force to be represented 
by a vector y in M 2 or R 3 . Often the problem involves some particular direction of 
interest, which is represented by another vector u. For instance, if the object is moving 
in a straight line when the force is applied, the vector u might point in the direction 
of movement, as in Fig. 5. A key step in the problem is to decompose the force into 
a component in the direction of u and a component orthogonal to u. The calculations 
would be analogous to those made in Example 3 above. 



FIGURE 5 


Orthonormal Sets 

A set {ui,..., u^} is an orthonormal set if it is an orthogonal set of unit vectors. If W 
is the subspace spanned by such a set, then {ui,..., u^} is an orthonormal basis for 
W, since the set is automatically linearly independent, by Theorem 4. 

The simplest example of an orthonormal set is the standard basis {ei,..., e ;1 } for 
W 1 . Any nonempty subset of {ei, … ， e n } is orthonormal, too. Here is a more compli¬ 
cated example. 


EXAMPLE 5 Show that {vi, V 2 , V 3 } is an orthonormal basis of R 3 , where 



3/vTT 


- 1 /V 6 


- 1 /V 66 

Vl = 

i/Vn 

， V2 = 

2 /V 6 

, V3 = 

- 4 /V 66 


i/Vn 


1 /V 6 


7 /V 66 


义 3 



SOLUTION Compute 

Vi -V2 = — 3/V66 + 2 / \/66 + 1/V66 = 0 
Vl'V 3 = -3/V726-4/V726 + 7/V726 = 0 
v 2 -v 3 = 1/\/396- 8/\/396 + 7/\/396 = 0 
Thus {vi, V2, V3} is an orthogonal set. Also, 

Vi-Vi = 9/11 + 1/11 + 1/11 = 1 
v 2 ，v 2 = 1/6 + 4/6 + 1/6=1 
V 3 .V 3 = 1/66 + 16/66 + 49/66 = 1 

which shows that Vi, \ 2 , and V 3 are unit vectors. Thus {vi, V 2 , V 3 } is an orthonormal set. 
Since the set is linearly independent, its three vectors forma basis for R 3 . See Fig. 6. ■ 
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THEOREM 6 


THEOREM 7 


When the vectors in an orthogonal set of nonzero vectors are normalized to have 
unit length, the new vectors will still be orthogonal, and hence the new set will be an or¬ 
thonormal set. See Exercise 32. It is easy to check that the vectors in Fig. 6 (Example 5) 
are simply the unit vectors in the directions of the vectors in Fig. 1 (Example 1). 

Matrices whose columns form an orthonormal set are important in applications and 
in computer algorithms for matrix computations. Their main properties are given in 
Theorems 6 and 7. 


An m x n matrix U has orthonormal columns if and only if U T U = I• 


PROOF To simplify notation, we suppose that U has only three columns, each a vector 
in R m . The proof of the general case is essentially the same. Let U = [u\ U 2 U 3 ] 
and compute 



「 U H 


u[ui u[u 2 u[u 3 

U T U = 


[ui u 2 U 3 ]= 

ujui uj u 2 U 2 U 3 




U^Ui U^U 2 U 『 U 3 _ 


⑷ 


The entries in the matrix at the right are inner products, using transpose notation. The 
columns of U are orthogonal if and only if 


U^U 2 = U 2 U 1 = 0, ufu 3 = ufui = 0, U 2 U 3 = U^U 2 = 0 (5) 

The columns of U all have unit length if and only if 

u[ui = 1, U 2 U 2 = 1, ufu 3 = 1 ( 6 ) 

The theorem follows immediately from (4)-(6). ■ 


Let U be an m x n matrix with orthonormal columns, and let x and y be in W l . 
Then 

a. \\Ux\\ = ||x|| 

b. (Ux)-(Uy) = x-y 

c. (t/x). (C/y) = 0 if and only if x.y = 0 


Properties (a) and (c) say that the linear mapping x\-^ Ux preserves lengths and 
orthogonality. These properties are crucial for many computer algorithms. See Exer¬ 
cise 25 for the proof of Theorem 7. 




1 /V 2 

2/3 


'sfl' 

1 . 


EXAMPLE 6 

Let U = 

1 /V 2 

-2/3 

and x = 

.Notice that U has or- 



0 

1/3 _ 




thonormal columns and 






U T U 

41 /V 2 

_ 2/3 

1 /V 2 

- 2/3 

0 " 
1/3. 

" 1 /V 2 
1 /V 2 - 
0 

2/3 _ 
2/3 
1/3 _ 

_ 1 0" 

= 0 1 _ 


Verify that \\Ux\\ = ||x||. 
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SOLUTION 



~l/V2 

2/3" 

'V2 - 


3" 

Ux = 

1/V2 

-2/3 

= 

-1 


0 

1/3 _ 



1 


wux\\ = V9 +1 +1 = yn 


||x|| = + 9 = vTT ■ 


Theorems 6 and 7 are particularly useful when applied to square matrices. An 
orthogonal matrix is a square invertible matrix U such that U 一 ' = U T . By Theorem 6, 
such a matrix has orthonormal columns. 1 It is easy to see that any square matrix with 
orthonormal columns is an orthogonal matrix. Surprisingly, such a matrix must have 
orthonormal rows, too. See Exercises 27 and 28. Orthogonal matrices will appear 
frequently in Chapter 7. 


EXAMPLE 7 The matrix 


u = 


3/yn 

I/Vn 

i/Vn 


-1/V5 -1/V66 
2 /V6 -4/V66 
l/x/6 7/V66 


is an orthogonal matrix because it is square and because its columns are orthonormal, 
by Example 5. Verify that the rows are orthonormal, too! ■ 


PRACTICE PROBLEMS 


1 . 

2 . 


Let ui 


-1/V5' 

2/V5. 


and U 2 


2/V5 

1/V5. 


.Show that {ui ， U 2 } is an orthonormal 


basis for R 2 . 

Let y and L b 
y onto L using u 


Let y and L be as in Example 3 and Fig. 3. Compute the orthogonal projection y of 
' 2 " 

instead of the u in Example 3. 


3. Let U and x be as in Example 6, and let y = 


-3V2 

6 


Verify that Ux»Uy = x.y. 


6.2 EXERCISES 

In Exercises 1 - 6 , determine which sets of vectors are orthogonal. 


-1 


5 


3 


1 


0 


-5 

4 

, 

2 

, 

-4 

2 . 

-2 

, 

1 

, 

-2 

-3 


1 


-7 


1 


2 


1 



2 


—6 


3 


2 


0 


4 

3 . 

-7 

, 

-3 

, 

1 

4 . 

-5 

, 

0 

, 

-2 


-1 


9 


-1 


-3 


0 


6 



In Exercises 7 - 10 , show that {ui, U2} or {ui, 112, 113} is an orthog¬ 
onal basis for R 2 or R 3 , respectively. Then express x as a linear 


combination of the u’s. 


7 . Ui 


,u 2 


,and x 


9 

-7 


1 A better name might be orthonormal matrix, and this term is found in some statistics texts. However, 
orthogonal matrix is the standard term in linear algebra. 
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3/vTO' 

-l/V^O 

L-1/V20. 

1 /V 2 
0 

- 1 /V 2 


0 . 
- 1 /V 2 
1 /V 2 


-2/3 

1/3 

-2/3 


In Exercises 23 and 24, all vectors are in K n . Mark each statement 
True or False. Justify each answer. 


27. Let (7 be a square matrix with orthonormal columns. Explain 
why U is invertible. (Mention the theorems you use.) 

2$. Let t/ be an « x /I orthogonal matrix. Show that the rows of 
U form an orthonormal basis of R n . 

29. Let U and V n x n orthogonal matrices. Explain why 
UV is an orthogonal matrix. [That is, explain why UV is 
invertible and its inverse is (UV) T .] 

30. Let U be an orthogonal matrix, and construct V by inter¬ 
changing some of the columns of U • Explain why V is an 
orthogonal matrix. 

31. Show that the orthogonal projection of a vector y onto a line 
L through the origin in R 2 does not depend on the choice 
of the nonzero u in L used in the formula for y. To do 
this, suppose y and u are given and y has been computed by 
formula (2) in this section. Replace u in that formula by cu, 
where c is an unspecified nonzero scalar. Show that the new 
formula gives the same y. 

32. Let {vi, ¥2} be an orthogonal set of nonzero vectors, and let 
Ci , C 2 be any nonzero scalars. Show that {ciVi, C2V2} is also 
an orthogonal set. Since orthogonality of a set is defined in 
terms of pairs of vectors, this shows that if the vectors in 
an orthogonal set are normalized, the new set will still be 
orthogonal. 

33. Given u _ 0 in R”，let L = Span {u}. Show that the map¬ 
ping x proj L x is a linear transformation. 


23. a. Not every linearly independent set in K n is an orthogonal 
set. 


34. Given u ^ 0 in R n , let L = Span {u}. For y in M' the 
reflection of y in L is the point refl^ y defined by 


26. Suppose W is a subspace of R w spanned by n nonzero 
orthogonal vectors. Explain why VK = R 71 . 


e. If L is a line through 0 and if y is the orthogonal projection 
of y onto L, then ||y|| gives the distance from y to L. 

24. a. Not every orthogonal set in R n is linearly independent. 

b. If a set 5 = {ui,..., u p } has the property that u z • u ; = 0 
whenever / ^ 7, then S is an orthonormal set. 

c. If the columns of an m x n matrix A are orthonormal, then 
the linear mapping x i-^- Ax preserves lengths. 

d. The orthogonal projection of y onto y is the same as the 
orthogonal projection of y onto cv whenever c _ 0. 

e. An orthogonal matrix is invertible. 

25. Prove Theorem 7. [Hint: For (a), compute ||t/x|| 2 , or prove 

(b) first.] 


b. If y is a linear combination of nonzero vectors from an 
orthogonal set, then the weights in the linear combination 
can be computed without row operations on a matrix. 

c. If the vectors in an orthogonal set of nonzero vectors are 
normalized, then some of the new vectors may not be 
orthogonal. 

d. A matrix with orthonormal columns is an orthogonal 
matrix. 


10. Ui = 


8. Ui 


,u 2 


,and x : 


9. 



1 


-1 


2 


8 

Ui = 

0 

,u 2 = 

4 

,u 3 = 

1 

,and x = 

-4 


1 


1 


-2 


-3 


3 


2 


1 


5 

-3 

0 

,U2 = 

2 

-1 

,U3 = 

1 

4 

,and x = 

-3 

1 


11. Compute the orthogonal projection of 


through 


and the origin. 


12. Compute the orthogonal projection of 


onto the line 


onto the line 


through 


13. Let y : 


and the origin. 


2 


and u : 


.Write y as the sum of two 


orthogonal vectors, one in Span {u} and one orthogonal to u. 
7_ 


14. Let y 


and u : 


.Write y as the sum of a vector 


in Span {u} and a vector orthogonal to u. 


15. Let y : 


and u : 


6 


.Compute the distance from y 


to the line through u and the origin. 


16. Let y : 


and u : 


2 


.Compute the distance from y 


to the line through u and the origin. 

In Exercises 17-22, determine which sets of vectors are orthonor¬ 
mal. If a set is only orthogonal, normalize the vectors to produce 
an orthonormal set. 


17. 


19. 


1/3. 

1/3 

1/3 


- 1 / 2 ' 

0 

1/2 


-.6] r.8" 

•8」， L .6_ 


18. 


20 . 


0 


—2/3. 

1/3 

2/3 


0 


1/3 _ 
2/3 
0 


10 10 10 18 18 18 

/r/2/2/r/r/r 

////// 

13 3 1 4 ■ — I 


21 


22 
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36. [M] In parts (a)-(d), let U be the matrix formed by normal- 


ref^y = 2-proj L y-y 

See the figure, which shows that refl^ y is the sum of 
y = proj L y and y — y. Show that the mapping y reflL y 
is a linear transformation. 

x i 


L = Span{u} 






y-y 


The reflection of y in a line through the origin. 

35. [M] Show that the columns of the matrix A are orthogonal 
by making an appropriate matrix calculation. State the cal¬ 
culation you use. 


izing each column of the matrix A m Exercise 35. 

a. Compute U T U and UU T . How do they differ? 

b. Generate a random vector y in R 8 , and compute 
p = UU T y and z = y — p. Explain why p is in Col A. 
Verify that z is orthogonal to p. 

c. Verify that z is orthogonal to each column of U. 

d. Notice that y = p + z, with p in Col A. Explain why z is 
in (Coli4)i. (The significance of this decomposition of 
y will be explained in the next section.) 


SG 


Mastering: Orthogonal 
Basis 6-4 


SOLUTIONS TO PRACTICE PROBLEMS 


The vectors are orthogonal because 

ui • u 2 = -2/5 + 2/5 = 0 
They are unit vectors because 

l|ui|| 2 = (-1/V5) 2 + (2/V5) 2 = 1/5 + 4/5=1 
||u 2 || 2 二 (2/V5) 2 + (1/V5) 2 = 4/5 +1/5=1 

In particular, the set {ui, U2} is linearly independent, and hence is a basis for R 2 since 
there are two vectors in the set. 

_ 2 _ 


2. When y 


and u 


.yu 20 

y = ^ u = y 


4 


4 


This is the same y found in Example 3. The orthogonal projection does not seem to 
depend on the u chosen on the line. See Exercise 31. 



'1/V2 2/3' 

■-3V2 - 

6 


r 

3. Uy = 

l/x/2 -2/3 

= 

-7 


_ 0 1/3_ 


2 


Also, from Example 6, x = 

'V2' 

3 

and Ux = 

3" 

-1 




1 


.Hence 


Ux-U\ = 3 + 7 + 2= 12, and x-y = -6+ 18 = 12 



6 2 1 
_ _ I 


61362321 

32631612 
I - I I 

61362321 
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6.3 


W 

FIGURE 


ORTHOGONAL PROJECTIONS 


The orthogonal projection of a point in R 2 onto a line through the origin has an important 
analogue in . Given a vector y and a subspace W in there is a vector y inW such 
that ( 1 ) y is the unique vector in W for which y — y is orthogonal to W, and ( 2 ) y is 
the unique vector in W closest to y. See Fig. 1 . These two properties of y provide the 
key to finding least-squares solutions of linear systems, mentioned in the introductory 
example for this chapter. The full story will be told in Section 6 . 5 . 

To prepare for the first theorem, observe that whenever a vector y is written as a 
y linear combination of vectors ui ，…， in R' the terms in the sum for y can be grouped 

T into two parts so that y can be written as 

： y = zi + z 2 

._ 

0 y where z\ is a linear combination of some of the u, and Z2 is a linear combination of 

the rest of the u z . This idea is particularly useful when {ui，...，u, z } is an orthogonal 
basis. Recall from Section 6.1 that W 1 ' denotes the set of all vectors orthogonal to a 
subspace W. 

EXAMPLE 1 Let {ui, … ，115} be an orthogonal basis for R 5 and let 

y = C1U1 H - + c 5 u 5 

Consider the subspace W = Span {ui, 112}，and write y as the sum of a vector z\ in W 
and a vector Z2 in 

SOLUTION Write 


y = C1U1 + C 2 u 2 + C3U3 + C 4 u 4 + C5U5 

-- ' 、 - V -^ 

Zi z 2 


where Zi = C\U\ + C2U2 is in Span {111,112} 

and z 2 = C3U3 + C4U4 + C5U5 is in Span {113,114, U5}. 

To show that Z2 is in it suffices to show that Z2 is orthogonal to the vectors in the 
basis {ui, U2} for W. (See Section 6 . 1 .) Using properties of the inner product, compute 


Z2*Ui = (C3U3 + C 4 U 4 + C 5 U 5 )-Ui 

=C3U3 - Ui + C4U4* Ui + C5U5* Ui 
= 0 

because Ui is orthogonal to 113,114, and U5. A similar calculation shows that Z2*U2 = 0 . 
Thus Z2 is in VF 丄. ■ 


The next theorem shows that the decomposition y = z\ + Z2 in Example 1 can be 
computed without having an orthogonal basis for W 1 . It is enough to have an orthogonal 
basis only for W. 
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THEOREM 8 


The Orthogonal Decomposition Theorem 

Let VT be a subspace of W 1 . Then each y in R 71 can be written uniquely in the 
form 


y = y+ z 


⑴ 


where y is in and z is in W 1 -. In fact, if {ui,..., u^} is any orthogonal basis 
of W, then 

-y. u i , , y ，u p 门、 

y = - ui H - 1 - u p (2) 


Ul- Ui 


n p -u p 


and z = y — y. 


The vector y in (1) is called the orthogonal projection of y onto W and often is 
written as proj^ y. See Fig. 2. When W is 3. one-dimensional subspace, the formula for 
y matches the formula given in Section 6.2. 


z=y-y y 



FIGURE 2 The orthogonal projection of y 
onto W. 


PROOF Let {ui, … ， u p } be any orthogonal basis for W, and define y by (2). 1 Then y 
is in W because y is a linear combination of the basis ui,..., u p . Let z = y — y. Since 
Ui is orthogonal to U2,..., u^, it follows from (2) that 

z.ui = (y — y)*ui = y-Uj — ( ^ Ul ) ui • Ui — 0 - 0 

= y-uj -y u x = 0 

Thus z is orthogonal to Ui. Similarly, z is orthogonal to each u y in the basis for W. 
Hence z is orthogonal to every vector in W. That is, z is in 

To show that the decomposition in (1) is unique, suppose y can also be written as 
y = yj + Zi, with ^inW and Zi in VF 丄 . Then y-\-z = y x -\-z\ (since both sides equal 
y), and so 

y-yj = zi -z 

This equality shows that the vector y = y — yj is in W and in (because z\ and z 
are both in and W 1 - is a subspace). Hence y-v = 0, which shows that y = 0. This 
proves that y = and also Z\ = z. ■ 

The uniqueness of the decomposition (1) shows that the orthogonal projection y 
depends only on W and not on the particular basis used in (2). 


1 We may assume that W is not the zero subspace, for otherwise 研丄 =and (1) is simply y = 0 + y. 
The next section will show that any nonzero subspace of R” has an orthogonal basis. 
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2" 


"-2" 


"1" 


EXAMPLE 2 Letui = 

5 

-1 

,U2 = 

1 

1 

,andy = 

2 

3 

. Observe that { 111 , 112 } 


is an orthogonal basis for W = Span {ui, U 2 }. Write y as the sum of a vector in W and 
a vector orthogonal to W. 


SOLUTION The orthogonal projection of y onto W is 


y-ui 

- 1 

Ul-Ui 


yu 2 

U 2 - U 2 


u 2 


9 

30 


9 

30 


15 
+ 30 


Also 







1 


—2/5 


7/5 

y-y = 

2 

- 

2 

= 

0 


3 


1/5 」 


14/5 


- 2/5 

2 

1/5 


Theorem 8 ensures that y — y is in To check the calculations, however, it is a good 
idea to verify that y — y is orthogonal to both ui and U 2 and hence to all of W. The 
desired decomposition of y is 


_r 

2 

= 

'- 2 / 5 ' 

2 

+ 

" 7/5 - 
0 

3 


_ V5_ 


_14/5_ 


A Geometric Interpretation of the Orthogonal Projection 

When W is di one-dimensional subspace, the formula (2) for proj^ y contains just one 
term. Thus, when dim W > 1, each term in (2) is itself an orthogonal projection of y 
onto a one-dimensional subspace spanned by one of the u’s in the basis for W. Figure 3 
illustrates this when W is a. subspace of M 3 spanned by Ui and 112 . Here yj and y 2 denote 
the projections of y onto the lines spanned by Ui and U 2 , respectively. The orthogonal 
projection y of y onto W is the sum of the projections of y onto one-dimensional sub¬ 
spaces that are orthogonal to each other. The vector y in Fig. 3 corresponds to the vector 
y in Fig. 4 of Section 6.2, because now it is y that is in W. 




u 2 =yi + h 


FIGURE 3 The orthogonal projection of y is the sum of 
its projections onto one-dimensional subspaces that are 
mutually orthogonal. 
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THEOREM 9 


Properties of Orthogonal Projections 

If {ui,..., u^} is an orthogonal basis for W and if y happens to be in W, then the 
formula for proj^ y is exactly the same as the representation of y given in Theorem 5 
in Section 6.2. In this case, proj^ y = y. 

If y is in = Span{ui,...,u^,}, then proj^ y = y. 

This fact also follows from the next theorem. 


The Best Approximation Theorem 

Let be a subspace of M. n , let y be any vector in R w , and let y be the orthogonal 
projection of y onto W. Then y is the closest point in W to y, in the sense that 

lly-yll < lly-v|| (3) 

for all \ inW distinct from y. 


The vector y in Theorem 9 is called the best approximation to y by elements of W . 
Later sections in the text will examine problems where a given y must be replaced, or 
approximated, by a vector v in some fixed subspace W. The distance from y to v, given 
by ||y — y||, can be regarded as the “error” of using v in place of y. Theorem 9 says that 
this error is minimized when y = y. 

Inequality (3) leads to a new proof thaty does not depend on the particular orthogo¬ 
nal basis used to compute it. If a different orthogonal basis for W were used to construct 
an orthogonal projection of y, then this projection would also be the closest point in W 
toy, namely, y. 

PROOF Take v in W distinct fromy. See Fig. 4. Then y — y is in By the Orthogonal 
Decomposition Theorem, y — y is orthogonal to W. In particular, y — y is orthogonal 
to y — v (which is in W). Since 

y - V = (y - y) + (y - v) 

the Pythagorean Theorem gives 

lly-v|| 2 = ||y-y|| 2 + ||y-v|| 2 

(See the colored right triangle in Fig. 4. The length of each side is labeled.) Now 
|| 歹 一 v|| 2 > 0 because y — v ^ 0, and so inequality (3) follows immediately. ■ 

A 

ny-yii iiy-vii 

— ^ 

iiy-vii v 



w 


FIGURE 4 The orthogonal projection of y 
onto W is the closest point in to y. 
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THEOREM 10 


WEB 



2" 


"-2" 


_r 


EXAMPLE 3 If ui = 

5 

-1 

,U 2 = 

1 

1 

， y = 

2 

3 

， and W = Span { 111 , 112 }， 


as in Example 2, then the closest point in VF to y is 


八 y-ui y-u 2 

y = - ui h - 

Ui .Ui U2.U2 

EXAMPLE 4 The distance from a point y in W 1 to a subspace W is defined as the 
distance from y to the nearest point in W. Find the distance from ytoW= Span {ui, U 2 }, 
where 

"-ll 「 5 一 

y = —5 ， Ui = -2 ， u 2 = 

10 」 L i_ 

SOLUTION By the Best Approximation Theorem, the distance from y to VF is ||y — y||, 
where y = proj^ y. Since {ui, 112 } is an orthogonal basis for IV, 


.15 

-21 1 

5 

7 

1 


-1 

y = —ui + 

——u 2 = - 

-2 


2 

= 

_8 

30 

6 2 

1 

2 

-1 


4 


-1 


-1 


0 

-5 

— 

-8 

= 

3 

10 


4 


6 


lly-yll 2 = 3 2 + 6 2 = 45 

The distance from y to is V45 = 3y/5. ■ 

The final theorem in this section shows how formula (2) for proj^ y is simplified 
when the basis for W is an orthonormal set. 

If {ui,..., u^} is an orthonormal basis for a subspace W of W 1 , then 

P ro J^ y = (y-ui)ui + (y-u 2 )u 2 H - + (y.ujup (4) 

Iff/ = [ui U 2 ••• Up], then 

proj^ y = UU T y for all y in R n (5) 

PROOF Formula (4) follows immediately from (2) in Theorem 8. Also, (4) shows 
that proj^ y is a linear combination of the columns of U using the weights y.ui, 
y*U 2 ,..., y.Up. The weights can be written as ufy, ufy,..., u^y, showing that they 
are the entries in U T y and justifying (5). ■ 

Suppose U is an n x p matrix with orthonormal columns, and let W be the column 
space of U. Then 

U T Ux = I p x = x for all x in R 77 
UU T y = proj^ y for all y in R ；T 

If is an /2 x 行 (square) matrix with orthonormal columns, then U is an orthogonal 
matrix, the column space W is all of W 1 , and UU T y = Iy = y for all y in W\ 

Although formula (4) is important for theoretical purposes, in practice it usually 
involves calculations with square roots of numbers (in the entries of the u ； ). Formula 
(2) is recommended for hand calculations. 
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11. y = 

.Write v as the sum of two vectors, one in 

12. y = 

13 


In Exercises 13 and 14, find the best approximation to z by vectors 
of the form ciYi + C 2 \i. 


13. z 


14. z 


15. Let y : 


3" 


2' 


-7 


-1 


2 

, Vl = 

-3 

,V 2 = 

3_ 


1_ 


2" 


2" 


4 


0 


0 

_i 

,Vi = 

-1 

,V2 = 

— i 

5_ 

_ D 

■-3_ 


u 2 


Find the 


distance from y to the plane in R 3 spanned by Uj and 112 . 

16. Let y, Vi, and V 2 be as in Exercise 12. Find the distance from 
y to the subspace of R 4 spanned by Vi and V 2 . 


17. Let y : 

W = Span {ui, u 2 }. 


"4" 


"2/3" 


"-2/3" 

8 

,Ui = 

1/3 

,u 2 = 

2/3 

1 


2/3 


1/3 


and 



x = 2 . Write x as the sum of two vectors, one 

0_ 

Span {uj, U 2 , U 3 } and the other in Span {U 4 }. 



1 


-2 


1 


—1 


2 


1 


1 


1 

Ui = 

1 

， u 2 = 

-1 

,u 3 = 

-2 

, U4 = 

1 


1 


1 


-1 


-2 


PRACTICE PROBLEM 



"-7" 


'-1" 


"- 9 " 

Let Ui = 

1 

4 

,u 2 = 

1 

-2 

, y = 

1 

6 


,and W = Spanjui,U 2 }. Use the fact 
that ui and U 2 are orthogonal to compute proj^ y. 


6.3 EXERCISES 


In Exercises 1 and 2, you may assume that {ui,... , 114 } is an 
orthogonal basis for R 4 . 


8. y 


,u 2 


0 


3 


1 


5 


1 - 」 


L 」 


1 - 」 



1 


5 


0 


-3 


4 


1 


一 1 


-1 

-4 

-1 

,u 2 = 

1 

1 

,u 3 = 

1 

-4 

,U4 = 

-1 

1 

9 . y = 

3 

3 

,Ui = 

1 

0 

,U2 = 

3 

1 

,U3 = 

0 

1 

"101 








-1 


1 


-2 


1 




6 


-4 


0 

6. y = 

4 

,Ui = 

-1 

,u 2 = 

1 


_ 1_ 


1_ 


1 


In Exercises 7-10, let W be the subspace spanned by the u’s, and 
write y as the sum of a vector in W and a vector orthogonal to W. 


7. y = 

_ r 
3 

,Ui = 

r 

3 

, u 2 = 

"5" 

1 


5 


-2 


4 


Span {ui} and the other in Span { 112 , 113 , 114 }. 

In Exercises 3-6, verify that {ui, 112 } is an orthogonal set, and then 
find the orthogonal projection of y onto Span {uj, U 2 }. 



"-1" 


_r 


"-1 " 

3. y = 

4 

,Ui = 

1 

,u 2 = 

1 


3_ 


_ 0 _ 


0 _ 


6" 


"3" 


"-4" 

4. y = 

3 

,Ui = 

4 

,u 2 = 

3 


-2 


0 


0 



3 


1 


1 


0 


4 


1 


0 


-1 

y = 

5 

,Ui = 

0 

,u 2 = 

1 

,u 3 = 

1 


6 


-1 


1 


-1 


In Exercises 11 and 12, find the closest point to y in the subspace 
W spanned by \\ and V 2 . 



1 


-4 


-2 


1 

,Vi = 

-1 

,V 2 = 

0 


2 


3 


4 5 3 3 


UI 


y 

5. 
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a. Let t/ = [uj u 2 ]. Compute U T U and UU T . 

b. Compute proj^ y and (UU T )y. 


18 . Let y : 


" 7 ' 


'1/-/10" 

_9 

， u i = 

-—11 损 - 


,and W = Span {ui}. 


a. Let U be the 2 x 1 matrix whose only column is 
Compute U T U and UU T . 

b. Compute proj^ y and (UU T )y. 


Note that 


Ui and U2 are orthogonal but that U3 is not orthogonal to Ui or 
U2. It can be shown that U3 is not in the subspace W spanned 
by Ui and 112. Use this fact to construct a nonzero vector y in 
R 3 that is orthogonal to Ui and 112. 


19 . Let Ui = 

" r 
1 

,u 2 = 

5 " 

-1 

,and U3 = 

"0" 

0 


-2 


2 


1 


20 . Let Ui and U2 be as in Exercise 19 , and let U4 


0 


It can 


be shown that U4 is not in the subspace W spanned by u! and 
U2. Use this fact to construct a nonzero vector y in R 3 that is 
orthogonal to Ui and 112. 


In Exercises 21 and 22 , all vectors and subspaces are in R”. Mark 
each statement True or False. Justify each answer. 

21. a. If z is orthogonal to Ui and to 112 and if W = 
Span {ui, U2}, then z must be in 

b. For each y and each subspace W, the vector y — proj^ y 
is orthogonal to W. 

c. The orthogonal projection y of y onto a subspace W can 
sometimes depend on the orthogonal basis for W used to 
compute y. 

d. If y is in a subspace W, then the orthogonal projection of 
y onto W is y itself. 


e. If the columns of an n x p matrix U are orthonormal, then 
UU T y is the orthogonal projection of y onto the column 
space of U. 

22. a. If is a subspace of W 1 and if y is in both W and W 丄， 

then y must be the zero vector. 

b. In the Orthogonal Decomposition Theorem, each term in 
formula (2) for y is itself an orthogonal projection of y 
onto a subspace of W. 

c. If y = zi + Z 2 , where Zi is in a subspace W and Z 2 is in 

then zi must be the orthogonal projection of y onto 
W. 

d. The best approximation to y by elements of a subspace 
W is given by the vector y — proj^ y. 

e. If an « x matrix U has orthonormal columns, then 
UU T x = x for all x in R”. 

23. Let Ab& an m x n matrix. Prove that every vector x in 
can be written in the form x = p + u, where p is in Row A 
and u is in Nul A. Also, show that if the equation Ax = b 
is consistent, then there is a unique p in Row A such that 

= b. 

24. Let W be a subspace of M, n with an orthogonal basis 
{wi,..., w；,}, and let {vi,..., y^} be an orthogonal basis for 

丄. 

a. Explain why {wi ， … ， w p ， v! ， … ， v 9 } is an orthogonal 
set. 

b. Explain why the set in part (a) spans R”. 

c. Show that dim W + dim W 1 - = n. 

25. [M] Let U be the 8x4 matrix in Exercise 36 in Section 6.2. 
Find the closest point to y = (1 ， 1, 1 ， 1, 1 ， 1,1,1) in Col U. 
Write the keystrokes or commands you use to solve this 
problem. 

26. [M] Let U be the matrix in Exercise 25. Find the distance 
from b = (1, 1 ， 1 ， 1,-1,-1,-1,-1) to Colt/. 


SOLUTION TO PRACTICE PROBLEM 


Compute 


proj^ y = 


^u 1 + ^u 2 = 

Ui .Ui U2.U2 


88 -2 

77 u l + ~T U 2 
DO D 



In this case, y happens to be a linear combination of Ui and 112 , so y is in W. The closest 
point in W to y is y itself. 
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6.4 THE GRAM-SCHMIDT PROCESS 


The Gram-Schmidt process is a simple algorithm for producing an orthogonal or 
orthonormal basis for any nonzero subspace of R w . The first two examples of the process 
are aimed at hand calculation. 


^3 



- X 1 


FIGURE 1 

Construction of an orthogonal 
basis {vi, V 2 }. 


EXAMPLE 1 Let W = Span{xi,X 2 }, where xi 
struct an orthogonal basis {vi, V 2 } for W. 


and X 2 


.Con- 


SOLUTION The subspace W is shown in Fig. 1, along with xi, X 2 , and the projection 
p of X 2 onto xi. The component of X 2 orthogonal to xi is X 2 — p, which is in W because 
it is formed from X 2 and a multiple of xi. Let Vi = xi and 


X2*Xi 

y 2 = x 2 - p = x 2 - xi 

Xl-Xi 


15 

45 


Then {vi, V 2 } is an orthogonal set of nonzero vectors in W. Since dim W = 2, the set 
{vi, V 2 } is a basis for W. ■ 

The next example fully illustrates the Gram-Schmidt process. Study it carefully. 


EXAMPLE 2 Let xi 


" 1 " 


'O' 


" 0 " 

1 

1 

,X 2 = 

1 

1 

,and X 3 = 

0 

1 

1 


1 


1 


.Then {xi,X 2 ,X 3 } is 


clearly linearly independent and thus is a basis for a subspace W of R 4 . Construct an 
orthogonal basis for W. 


SOLUTION 


Step 1 • Let Vi = Xi and W\ = Span{xi} = Span{vi}. 

Step 2. Let V 2 be the vector produced by subtracting from X 2 its projection onto the 
subspace W\. That is, let 


y 2 = x 2 - proj^ x 2 


X2-vi 

=X 2 - Vi 

vr vi 


Since vi = Xi 


"0" 


"1" 


'- 3 / 4 ' 

1 

3 

1 


1/4 

1 

_ 4 

1 


1/4 

1 


1 


1/4 


As in Example 1 ， V 2 is the component of X 2 orthogonal to xi, and {vi,V 2 } is an 
orthogonal basis for the subspace W 2 spanned by xi and X 2 . 

Step 2’ (optional). If appropriate, scale V 2 to simplify later computations. Since \2 has 
fractional entries, it is convenient to scale it by a factor of 4 and replace {vi, V 2 } by the 
orthogonal basis 



vi = 
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Step 3. Let V 3 be the vector produced by subtracting from X 3 its projection onto the 
subspace W 2 . Use the orthogonal basis {y^y^} to compute this projection onto W 2 ： 


Projection of Projection of 
X 3 onto Vi X 3 onto V 2 



1 


-3 


0 

X 3 - Vi x 3 .v ’ 2 , 2 

1 

2 

1 


2/3 

pr0J ^ X3 - vkv/ 1 + v^.^ V2 -4 

1 

+ l 2 

1 

— 

2/3 


1 


1 


2/3 


Then V 3 is the component of X 3 orthogonal to W 2 , namely, 

"0 
0 

y 3 = x 3 - proj ^ 2 x 3 = 


See Fig. 2 for a diagram of this construction. Observe that V 3 is in W, because X 3 
and proj 灰 2 X3 are both in W. Thus {vi, y^, V3} is an orthogonal set of nonzero vectors 
and hence a linearly independent set in W. Note that W is three-dimensional since it 
was defined by a basis of three vectors. Hence, by the Basis Theorem in Section 4.5, 
{vi, V 2 , V 3 } is an orthogonal basis for W. ■ 



A 


0 

-2/3 


2/3 


1/3 


2/3 


1/3 



FIGURE 2 The construction of V 3 from X 3 
and W 2 . 


The proof of the next theorem shows that this strategy really works. Scaling of 
vectors is not mentioned because that is used only to simplify hand calculations. 


THEOREM 11 The Gram-Schmidt Process 

Given a basis {xi,... ,x p } for a nonzero subspace W of W 1 , define 


vi = 

=X] 




X2-Vi 

V2 = 

=x 2 - 

Vi 



vr vi 

V 3 = 

=x 3 - 

X3* Vi X3 - \2 

Vi V 2 


vr vi \ 2 - v 2 


x^-vi x ， v 2 ^ P -y p -\ 

Vn = x p - vi - \2 - Vn_i 

Vi-Vi \2- V 2 - y^-1 

Then {vi, ..., y^} is an orthogonal basis for W. In addition 

Span {vi,..., y^} = Span {xi,..., x^} {or l < k < p (1) 
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WEB 


PROOF For I < k < p,lQtWk = Span {xi,..., x^}. Set Vi = xi, so that Span {vi}= 
Span{xi}. Suppose, for some k < p, we have constructed so that 

{vi,..., v^：} is an orthogonal basis for Wk. Define 

Vfc+i = x 々 +1 _ proj 的 x k+1 ( 2 ) 


By the Orthogonal Decomposition Theorem, is orthogonal to Note that 
proj^ x^ + i is in and hence also in W^+i. Since x^ + i is in so is (because 
Wk-\-\ is a subspace and is closed under subtraction). Furthermore, \k-\-\ 7 ^ 0 because 
X &+1 is not in Wk = Span {xi,..., x^}. Hence {vi,..., y^+i} is an orthogonal set of 
nonzero vectors in the (k + 1)-dimensional space By the Basis Theorem in Sec¬ 

tion 4.5, this set is an orthogonal basis for W4+i. Hence Wk-\-\ = Span {vi,..., va ： +i}. 
When k l = p, the process stops. ■ 


Theorem 11 shows that any nonzero subspace W of W l has an orthogonal basis, be¬ 
cause an ordinary basis {xi,... ,x p } is always available (by Theorem 11 in Section 4.5), 
and the Gram-Schmidt process depends only on the existence of orthogonal projections 
onto subspaces of W that already have orthogonal bases. 


Orthonormal Bases 


An orthonormal basis is constructed easily from an orthogonal basis {vi,..., v ^}： 
simply normalize (i.e., “scale”) all the \k. When working problems by hand, this is 
easier than normalizing each as soon as it is found (because it avoids unnecessary 
writing of square roots). 

EXAMPLE 3 Example 1 constructed the orthogonal basis 



"3" 


"O' 

Vl = 

6 

,v 2 = 

0 


0 


2 


An orthonormal basis is 


Ui = 


^j Vl = 


1 

V45 


"3" 


'1/V5' 

6 

= 

2/V5 

0 


0 


«2 = 


1 

- Vo 

l|V2|| 



■ 


QR Factorization of Matrices 

If an m x n matrix A has linearly independent columns xi,... ,x„, then applying the 
Gram-Schmidt process (with normalizations) to Xi,..., x„ amounts to factoring A, as 
described in the next theorem. This factorization is widely used in computer algorithms 
for various computations, such as solving equations (discussed in Section 6.5) and 
finding eigenvalues (mentioned in the exercises for Section 5.2). 
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THEOREM 12 


The QR Factorization 

If A is an m x n matrix with linearly independent columns, then A can be factored 
as ^4 = QR, where Q is an m x « matrix whose columns form an orthonormal 
basis for Col A and R is ann x n upper triangular invertible matrix with positive 
entries on its diagonal. 


PROOF The columns of A form a basis {xi , …， x„} for Col A. Construct an orthonor¬ 
mal basis {ui ， … ， u„} for = Col A with property (1) in Theorem 11. This basis may 
be constructed by the Gram-Schmidt process or some other means. Let 


Q = [ui u 2 ••- u, 2 ] 

For k = l,... ,n,Xk is in Span{xi ， … ， } = Span{ui,.. 
stants, rue, … ， rick, such that 


,u^}. So there are con- 


Xk = r\ k n\ + ••• + r kk \x k + 0-u^+i H - + 0-u„ 

We may assume that > 0. (If r^k < 0, multiply both and by —1.) This shows 
that Xk is a linear combination of the columns of Q using as weights the entries in the 
vector 

^ r xk ' 


r — r kk 

k — o 
_ 0 

Qrk fork = . ,n. Let R = [r! 

^4 = [xi ••- x„ ] = [ Qr\ • 


That is, Xk 


• r„ ]. Then 
Qr n ] = QR 


The fact that R is invertible follows easily from the fact that the columns of A are linearly 
independent (Exercise 19). Since R is clearly upper triangular, its nonnegative diagonal 
entries must be positive. ■ 


EXAMPLE 4 Find a QR factorization of A 


1 0 0 
1 1 0 
1 1 1 
1 1 1 


SOLUTION The columns of A are the vectors Xi, x〗，and X 3 in Example 2. An 
orthogonal basis for Col ^4 = Span {x \, X 2 , X 3 } was found in that example: 


Vl 


To simplify the arithmetic that follows, scale V 3 by letting = 3 v 3 . Then normalize 
the three vectors to obtain ui, U 2 , and U3, and use these vectors as the columns of Q: 


' 1 " 


"-3" 


" 0 

1 

,7 一 

1 

, v 3 = 

-2/3 

1 

, V 2 _ 

1 

1/3 

1 


1 


1/3 


V6V6V6 
/ / / 

2 11 



2 2 2 2 
/ / / / 


2 
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By construction, the first A: columns of Q are an orthonormal basis of Span {xi,..., x^}. 
From the proof of Theorem 12, A = QR for some R. To find R, observe that Q T Q = /, 
because the columns of Q are orthonormal. Hence 





Q t a = Q 

t (QR)= 

IR = R 

and 


-1/2 

1/2 

1/2 

1/2 


R = 

-3/Vi2 

i/Vn 

1/Vi2 

1/VI2 



_ 0 

-2/V6 

1/V6 

1/V6 



"2 3/2 

1 

- 



0 3/VI2 2 /VI 2 m 

0 0 2/V6 


i— NUMERICAL NOTES - 

1. When the Gram-Schmidt process is run on a computer, roundoff error can 
build up as the vectors u ^： are calculated, one by one. For j and k large but 
unequal, the inner products uju^ may not be sufficiently close to zero. This 
loss of orthogonality can be reduced substantially by rearranging the order 
of the calculations. 1 However, a different computer-based QR factorization is 
usually preferred to this modified Gram-Schmidt method because it yields a 
more accurate orthonormal basis, even though the factorization requires about 
twice as much arithmetic. 

2. To produce a QR factorization of a matrix A, a computer program usually 
left-multiplies ^4 by a sequence of orthogonal matrices until A is transformed 
into an upper triangular matrix. This construction is analogous to the left- 
multiplication by elementary matrices that produces an LU factorization of A. 


PRACTICE PROBLEM 


Let W = Span{xi,x 2 }, wherexi = 

"1" 

1 

andx2 = 

1/3' 

1/3 

.Construct an orthonor 


1 


-2/3 



mal basis for W. 


6.4 EXERCISES 

In Exercises 1-6, the given set is a basis for a subspace W. Use 
the Gram-Schmidt process to produce an orthogonal basis for W. 


3" 


8 " 


" 0 " 


5' 

0 

, 

5 

2 . 

4 

, 

6 

-1 


—6 


2 


-7 


3. 


5. 


2 " 


4" 


3" 


"-3" 

-5 

, 

—1 

4. 

-4 

, 

14 

1 _ 


_ 2 _ 


5_ 


_-7_ 

_ r 


7" 


3" 


"-5" 

-4 


-7 

6. 

-1 


9 

0 

? 

-4 

2 


-9 

1 


1 


-1 


3 


1 See Fundamentals of Matrix Computations, by David S. Watkins (New York: John Wiley & Sons, 1991), 
pp. 167-180. 












































6.4 The Gram-Schmidt Process 359 


7. Find an orthonormal basis of the subspace spanned by the 
vectors in Exercise 3. 

8. Find an orthonormal basis of the subspace spanned by the 
vectors in Exercise 4. 


Find an orthogonal basis for the column space of each matrix in 
Exercises 9-12. 


3 

-5 

r 


"-1 

6 

6" 

1 

1 

1 

10 . 

3 

-8 

3 

-1 

5 

-2 

1 

-2 

6 

3 

-7 

8_ 


1 

-4 

-3_ 

"1 

2 

5" 


"1 

3 

5" 

-1 

1 

-4 


-1 

-3 

1 

-1 

4 

-3 

12 . 

0 

2 

3 

1 

—4 

7 


1 

5 

2 

1 

2 

1 


1 

5 

8 


In Exercises 13 and 14, the columns of Q were obtained by 
applying the Gram-Schmidt process to the columns of A. Find an 
upper triangular matrix R such that A = QR. Check your work. 


13. 


14. 


5 

9~ 


5/6 

-1/6" 

1 

7 

，0 = 

1/6 

5/6 

-3 

-5 

-3/6 

1/6 

1 

5_ 


1/6 

3/6_ 

"-2 

3" 


"-2/7 

5/7 ■ 

5 

7 

，0 = 

5/7 

2/7 

2 

-2 

2/7 

—4/7 

4 

6 


_ 4/7 

2/7 


15. Find a QR factorization of the matrix in Exercise 11. 


16. Find a QR factorization of the matrix in Exercise 12. 


In Exercises 17 and 18, all vectors and subspaces are in R n . Mark 
each statement True or False. Justify each answer. 

17. a. If {vi,V2,V3} is an orthogonal basis for W, then mul¬ 

tiplying V3 by a scalar c gives a new orthogonal basis 
{vi, v 2 ,cy 3 }. 

b. The Gram-Schmidt process produces from a linearly in¬ 
dependent set {xi,... ,Xp] an orthogonal set {vi,... ,\ p } 
with the property that for each k, the vectors Vi,... ,y^ 
span the same subspace as that spanned by Xi ，…， x^：. 

c. If ^4 = QR, where Q has orthonormal columns, then 
R = Q T A. 

18. a. If VK = Span{xi,X2,X3} with {xi,X2,X3} linearly inde¬ 

pendent, and if {vi ， V2, V3} is an orthogonal set in W, then 
{vi, v 2 , V3} is a basis for W. 

b. If x is not in a subspace W, then x — proj^ x is not zero. 

c. In a QR factorization, say A = QR (when A has lin¬ 
early independent columns), the columns of Q form an 
orthonormal basis for the column space of A. 


19. Suppose A = QR, where Q is m x n and R is n x n. Show 
that if the columns of A are linearly independent, then R must 
be invertible. [Hint: Study the equation Rx = 0 and use the 
fact that A = QR.] 

20. Suppose A = QR, where R is an invertible matrix. Show 
that A and Q have the same column space. [Hint: Given y in 
Col A, show that y = Qx for some x. Also, given y in Col Q, 
show that y = Ax for some x.] 

21. Given A = QR as in Theorem 12, describe how to find an 
orthogonal mx m (square) matrix Q\ and an invertible n x n 
upper triangular matrix R such that 


The MATLAB qr command supplies this “full” QR factor¬ 
ization when rank A = n. 

22. Let u 1? ..., be an orthogonal basis for a subspace W of 
R n , and let T : E” — R” be defined by T(x) = proj^x. 
Show that 7" is a linear transformation. 


23. Suppose A = QR is a QR factorization of an m x « ma¬ 
trix A (with linearly independent columns). Partition A as 
[Ai A 2 ], where A\ has p columns. Show how to obtain a 
QR factorization of and explain why your factorization 
has the appropriate properties. 


24. [M] Use the Gram-Schmidt process as in Example 2 to 
produce an orthogonal basis for the column space of 


-10 

2 

—6 

16 

2 


13 

1 

3 

-16 


7 

-5 

13 

-2 

-5 



25. [M] Use the method in this section to produce a QR factor¬ 
ization of the matrix in Exercise 24. 


26. [M] For a matrix program, the Gram-Schmidt process works 
better with orthonormal vectors. Starting with Xi,..., x p as 
in Theorem 11, let A = [xi … x p ]. Suppose Q is an 
n x k matrix whose columns form an orthonormal basis for 
the subspace spanned by the first k columns of A. Then 
for x in , QQ T x is the orthogonal projection of x onto 
(Theorem 10 in Section 6.3). If is the next column of 
A, then equation (2) in the proof of Theorem 11 becomes 

va：+i = Xfc+i — Q(Q t x/c+i) 


(The parentheses above reduce the number of arithmetic 
operations.) Let u^+i = ||. The new Q for the 

next step is [ Q ]. Use this procedure to compute the 

QR factorization of the matrix in Exercise 24. Write the 
keystrokes or commands you use. 
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SOLUTION TO PRACTICE PROBLEM 


Let Vi = xi 


and y 2 = X2 ■ 


X2- Vl 
vr vi 


Vi = X2 — Ovi = X2. So {xi,X2} is already 


orthogonal. All that is needed is to normalize the vectors. Let 

1 

7V1 


一 1 一 


'1/V3' 

1 

= 

1/V3 

1 


.1/V3_ 


llvill' 1 V3 

Instead of normalizing \2 directly, normalize V 2 = 3 v 2 instead: 

1 


u 2 


r v 2 


IIV 2 II yi 2 + l 2 + (—2)2 

Then {ui, 112} is an orthonormal basis for W. 


_ r 


— i/Ve- 

1 

= 

1 /V 6 

-2 


.- 2 /V 6 . 


6.5 LEAST-SQUARES PROBLEMS 

The chapter’s introductory example described a massive problem Ax = b that had no 
solution. Inconsistent systems arise often in applications, though usually not with such 
an enormous coefficient matrix. When a solution is demanded and none exists, the best 
one can do is to find an x that makes Ax as close as possible to b. 

Think of Ax as an approximation to b. The smaller the distance between b and Ax, 
given by ||b — ^x||, the better the approximation. The general least-squares problem 
is to find an x that makes ||b — ^ 4 x|| as small as possible. The adjective “least-squares” 
arises from the fact that ||b — ^ 4 x|| is the square root of a sum of squares. 


DEFINITION If A is m xn and b is in M m , a least-squares solution of ^ 4 x = b is an x in 
such that 

l|b-^|| < ||b- 4 x|| 

for all x in R n . 


The most important aspect of the least-squares problem is that no matter what x we 
select, the vector Ax will necessarily be in the column space, Col A. So we seek an x 
that makes Ax the closest point in Col A to b. See Fig. 1 . (Of course, if b happens to be 
in Col A, then b is Ax for some x, and such an x is a “least-squares solution .’’） 



FIGURE 1 The vector b is closer to Ax than 
to Ax for other x. 
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Solution of the General Least-Squares Problem 

Given A and b as above, apply the Best Approximation Theorem in Section 6.3 to the 
subspace Col A. Let 

b = proj CoM b 

Because b is in the column space of A, the equation ^4x = b is consistent, and there is 
an x in such that 

^x = b (1) 

Since b is the closest point in Col ^4 to b, a vector xis a least-squares solution of Ax = b 
if and only if x satisfies (1). Such an x in is a list of weights that will build b out of 
the columns of A. See Fig. 2. [There are many solutions of (1) if the equation has free 
variables.] 



FIGURE 2 The least-squares solution x is in R”. 


Suppose x satisfies Ax = b. By the Orthogonal Decomposition Theorem in Sec¬ 
tion 6.3, the projection b has the property that b — b is orthogonal to Co\A, so b — Ax 
is orthogonal to each column of A. If a y is any column of A, then ay • (b — Ax) = 0, 
and aj (b — Ax) = 0. Since each aJ is a row of A T , 

^^(b-^x) =0 (2) 

(This equation also follows from Theorem 3 in Section 6.1.) Thus 

A T b- A t Ax = 0 

A t Ax = A T b 

These calculations show that each least-squares solution of Ax = b satisfies the equation 


A t Ax = A T b ⑶ 

The matrix equation (3) represents a system of equations called the normal equations 
for Ax = b. A solution of (3) is often denoted by x. 


THEOREM 13 The set of least-squares solutions of Ax = b coincides with the nonempty set of 
solutions of the normal equations A T Ax = A T b. 


PROOF As shown above, the set of least-squares solutions is nonempty and each 
least-squares solution x satisfies the normal equations. Conversely, suppose x satisfies 
A t Ax = A T b. Then x satisfies (2) above, which shows that b — Ax is orthogonal to the 
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rows of A t and hence is orthogonal to the columns of A. Since the columns of A span 
Col A, the vector b — Ax is orthogonal to all of Col A. Hence the equation 

b = ^4x + (b — ^4x) 

is a decomposition of b into the sum of a vector in Col ^4 and a vector orthogonal to 
Col A. By the uniqueness of the orthogonal decomposition, Ax must be the orthogonal 
projection of b onto Col A. That is, Ax = b, and x is a least-squares solution. ■ 


EXAMPLE 1 Find a least-squares solution of the inconsistent system Ax = b for 



"4 

0" 


2" 

A = 

0 

2 

,b = 

0 


1 

1 


11 


SOLUTION To use normal equations (3), compute: 


a t a = 

"4 0 

r 

"4 0" 

0 2 


"i7 r 

0 2 

i 

1 1 


1 5 


A T b = 

"4 0 

r 

2" 

0 


'19" 

0 2 

i 

11 


11 


Then the equation A T Ax = A T b becomes 


"17 

r 

■ 


"19" 

1 

5_ 

_^2_ 


11 


Row operations can be used to solve this system, but since A T A is invertible and 2x2, 
it is probably faster to compute 


(A)- 


84 


5-1 
-1 17 


and then to solve A T Ax = A T b ; 


x= (A T A)- l A T b 


5 -1" 

"19" 

1 

84' 


_ r 

-1 17_ 

11 

= 84 

■ 168_ 


2 


In many calculations, A T A is invertible, but this is not always the case. The next 
example involves a matrix of the sort that appears in what are called analysis of variance 
problems in statistics. 


EXAMPLE 2 Find a least-squares solution of ^4x = b for 


A = 


"1 

1 

0 

0" 


"-3" 

1 

1 

0 

0 


-1 

1 

0 

1 

0 

， b = 

0 

1 

0 

1 

0 

2 

1 

0 

0 

1 


5 

1 

0 

0 

1 


1 
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SOLUTION Compute 


+ x 4 . m 


The next theorem gives useful criteria for determining when there is only one least- 
squares solution of Ax = b. (Of course, the orthogonal projection b is always unique.) 

THEOREM 14 Let ^4 be an m x matrix. The following statements are logically equivalent: 

a. The equation Ax = b has a unique least-squares solution for each b in R m . 

b. The columns of A are linearly indpendent. 

c. The matrix A T A is invertible. 

When these statements are true, the least-squares solution x is given by 

X = (A T A)~ l A T b (4) 


The main elements of a proof of Theorem 14 are outlined in Exercises 19-21, which 
also review concepts from Chapter 4. Formula (4) for x is useful mainly for theoretical 
purposes and for hand calculations when A T A is a 2 x 2 invertible matrix. 

When a least-squares solution x is used to produce Ax as an approximation to b, 
the distance from b to Ax is called the least-squares error of this approximation. 

EXAMPLE 3 Given A and b as in Example 1, determine the least-squares error in 
the least-squares solution of Ax = b. 



6 

2 

2 

2 

4 


1 

0 

0 

1 

3 

2 

2 

0 

0 

-4 


0 

1 

0 

-1 

-5 

2 

0 

2 

0 

2 


0 

0 

1 

-1 

-2 

2 

0 

0 

2 

6 


0 

0 

0 

0 

0 


The general solution is X\ = 3 — x^, X 2 = —5 + X 4 , X 3 = —2 + X 4 , 
the general least-squares solution of Ax = b has the form 


2 0 0 2 


2 0 2 0 


2 2 0 0 


6 2 2 2 


00001 


001100 


o 

s 

e. 

e 

fr 

Is 

X 4 
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4 4 2 6 


110000 


IX 1A 1A 1A 


3 10 2 5 


Is 

r b 

yl 


lx cl 】 cl】 11 


1 A n]0 11 n]0 


1A cu1An 〕 


11 11 cl 】 cl 】 


lx cl 】 cl 3 —1 


1A n]0 11 n]0 


IX cu 1 A n 〕 
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SOLUTION From Example 1 ， 

2 . 

0 and Ax 
11 


Hence 


and 


b — ^4x 


2 


4 


-2 

0 

— 

4 

= 

-4 

11 


3 


8 


||b- ^x|| = y/(-2) 2 + (-4)2 + 8 2 = V84 


The least-squares error is \/84. For any x in R 2 , the distance between b and the vector 
Ax is at least \/84. See Fig. 3. Note that the least-squares solution x itself does not 
appear in the figure. ■ 


Alternative Calculations of Least-Squares Solutions 

The next example shows how to find a least-squares solution of Ax = b when the 
columns of A are orthogonal. Such matrices often appear in linear regression problems, 
discussed in the next section. 


EXAMPLE 4 Find a least-squares solution of ^4x = b for 


A = 



SOLUTION Because the columns ai and a 2 of A are orthogonal, the orthogonal 
projection of b onto Col A is given by 


b 


b*ai 
ar ai 


-ai 


b a 2 
a2* a2 


-a 2 


8 45 

4 31 + 90 32 


(5) 


"2" 

2 

2 


"-3 " 
1 


-1 

+ 

—丄 

1/2 

= 

丄 

5/2 

2 


_7/2_ 


11/2 


Now that b is known, we can solve Ax = b. But this is trivial, since we already 
know what weights to place on the columns of A to produce b. It is clear from (5) that 


8/4 


2 

_45/90_ 


_l/2_ 


In some cases, the normal equations for a least-squares problem can be ill- 
conditioned', that is, small errors in the calculations of the entries of A T A can sometimes 
cause relatively large errors in the solution x. If the columns of A are linearly 
independent, the least-squares solution can often be computed more reliably through 
a QR factorization of A (described in Section 6.4). 1 


1 The QR method is compared with the standard normal equation method in G. Golub and C. Van Loan, 
Matrix Computations, 3rd ed. (Baltimore: Johns Hopkins Press, 1996) ， pp. 230-231. 
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EXAMPLE 5 Find the least-squares solution of Ax = b for 


" 1 

3 

5" 


3" 

1 

1 

0 

, b = 

5 

1 

1 

2 

7 

1 

3 

3 


-3 


A 


SOLUTION The QR factorization of A can be obtained as in Section 6.4. 


A = QR 


4 5 

2 3 

0 2 


Then 


Q r b 


The least-squares solution x satisfies Rx = Q T b\ that is, 



6 " 

= 

-6 


4 


'2 

4 

5 一 

~^i" 


6 " 

0 

2 

3 


= 

-6 

0 

0 

2 

_^3_ 


4 


This equation is solved easily and yields x 


■ 


10 

-6 

2 


THEOREM 15 


Given an m x n matrix A with linearly independent columns, let A = QR be a 
QR factorization of A as in Theorem 12. Then, for each b in R m , the equation 
v4x = b has a unique least-squares solution, given by 


X = R~ l Q T b 


⑹ 


PROOF Letx = R~ l Q T h. Then 

= QRx = QRR 一 1 Q T b= QQ T b 

By Theorem 12, the columns of Q form an orthonormal basis for Col A. Hence, by 
Theorem 10, QQ T b is the orthogonal projection b of b onto Col A. Then Ax = b, 
which shows that x is a least-squares solution of Ax = b. The uniqueness of x follows 
from Theorem 14. ■ 


NUMERICAL NOTE 


Since R in Theorem 15 is upper triangular, x should be calculated as the exact 
solution of the equation 

Rx=Q T b (7) 

It is much faster to solve (7) by back-substitution or row operations than to 
compute R~ l and use (6). 


2 2 2 2 
/ / / / 



2 2 2 2 
/ / / / 



2 2 2 2 
/ / / / 


2 2 2 
/ / / 



2 2 2 
/ / / 



2 2 2 
/ / / 



2 2 2 
/ / / 
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13. Let A 


3 4 


11 


5" 

1 

-2 1 

, b = 

-9 

, u = 

3 4 


5 


—丄 


,and y : 


Compute Au and Ay, and compare them with b. 


Could u possibly be a least-squares solution of Ax = b? 
(Answer this without computing a least-squares solution.) 


14. Let A 


2 

r 


"5" 


4" 

c 

-3 

—4 

, b = 

4 

, u = 

3 

2 


4 


— J 


and y : 


.Compute Au and A\, and compare them with b. Is 


it possible that at least one of u or y could be a least-squares 
solution of Ax = b? (Answer this without computing a least- 
squares solution.) 

In Exercises 15 and 16, use the factorization A = QR to find the 
least-squares solution of Ax = b. 


In Exercises 17 and 18, j is an m x « matrix and b is in K m . Mark 
each statement True or False. Justify each answer. 

17. a. The general least-squares problem is to find an x that 
makes Ax as close as possible to b. 


PRACTICE PROBLEMS 



"1 -3 

-3" 


5" 

1. Let A = 

1 

5 

1 

and b = 

-3 


1 

7 

2 


-5 


.Find a least-squares solution of Ax = b, 


and compute the associated least-squares error. 

2. What can you say about the least-squares solution of Ax = b when b is orthogonal 
to the columns of A1 


6.5 EXERCISES 


In Exercises 1-4, find a least-squares solution of Ax = b by 
(a) constructing the normal equations for x and (b) solving for 
x. 



1 

2" 


3" 

10. A = 

-1 

4 

,b = 

-1 


1 

2 


5 



"-1 2" 


"4" 

1. A = 

2 —3 

,b = 

1 


-1 3_ 


_2_ 


2 r 


■-5 

2, A = 

-2 0 

,b = 

8 


. 2 3 _ 


1 


"1-2" 


3 



"4 

0 

1 " 


"9" 

11. A = 

1 

6 

-5 

1 

1 

0 

， b = 

0 

0 

— 

1 

-1 

-5 


0 


12. A 


1 

1 

0" 


"2" 

1 

0 

-1 

,b = 

5 

0 

1 

1 

6 

1 

1 

-1 


6 


3. o 3 ， b= -4 

2 5 2 




- 1 



"1 3" 


"5" 

4 . A = 

1 -1 

,b = 

1 


1 1 


0 


In Exercises 5 and 6, describe all least-squares solutions of the 
equation Ax = b. 




"1 

1 

0" 


"1" 

5. 

A = 

1 

1 

0 

,b = 

3 

1 

0 

1 

8 



_1 

0 

1 _ 


_2_ 



"1 

1 

0" 


■7_ 



1 

1 

0 


2 

6. 

A = 

1 

1 

0 

,b = 

3 

6 

1 

0 

1 



1 

0 

1 


5 



1 

0 

1 


4 


7. Compute the least-squares error associated with the least- 
squares solution found in Exercise 3. 

8. Compute the least-squares error associated with the least- 
squares solution found in Exercise 4. 

In Exercises 9-12, find (a) the orthogonal projection of b onto 
Col A and (b) a least-squares solution of Ax = b. 



1 5" 


4" 

9. A = 

3 1 

-2 4 

,b = 

-2 

-3 


b 


/ / / / 


/ / / / 
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b. A least-squares solution of Ax = b is a vector x that 
satisfies A\ = b, where b is the orthogonal projection of 
b onto Col A. 

c. A least-squares solution of Ax = b is a vector x such that 
||b- Ax\\ < ||b - Ax\\ for all x in R n . 

d. Any solution of A T Ax = A T b is a least-squares solution 
of Ax = b. 

e. If the columns of A are linearly independent, then the 
equation Ax = b has exactly one least-squares solution. 

18. a. If b is in the column space of A, then every solution of 

Ax = b is a least-squares solution. 

b. The least-squares solution of Ax = b is the point in the 
column space of A closest to b. 

c. A least-squares solution of Ax = b is a list of weights 
that, when applied to the columns of A, produces the 
orthogonal projection of b onto Col A. 

d. If x is a least-squares solution of Ax = b, then 
x = (A T A)~ l A T b. 

e. The normal equations always provide a reliable method 
for computing least-squares solutions. 

f. If A has a QR factorization, say A = QR, then the best 
way to find the least-squares solution of Ax = b is to 
compute x = R~ l Q T b. 

19. Let Abe anm x n matrix. Use the steps below to show that a 
vector x in satisfies Ax = 0 if and only if A T Ax = 0. This 
will show that Nul A = Nul A T A. 

a. Show that if Ax = 0, then A T Ax = 0. 

b. Suppose A t Ax = 0. Explain why x T A T Ax = 0, and use 
this to show that Ax = 0. 

20. Let Abe anm x n matrix such that A T A is invertible. Show 
that the columns of A are linearly independent. [Careful: 
You may not assume that A is invertible; it may not even 
be square.] 

21. Let ^4 be an m x n matrix whose columns are linearly inde¬ 
pendent. [Careful: A need not be square.] 

a. Use Exercise 19 to show that A T A is an invertible matrix. 

b. Explain why A must have at least as many rows as 
columns. 

c. Determine the rank of A. 

22. Use Exercise 19 to show that rank^ 7 ^ = rank A. [Hint: 
How many columns does A T A have? How is this connected 
with the rank of A T A1~\ 

23. Suppose A is m x n with linearly independent columns and 
b is in R m . Use the normal equations to produce a formula 
for b, the projection of b onto Col A. [Hint: Find x first. The 
formula does not require an orthogonal basis for Col A] 


24. Find a formula for the least-squares solution of Ax = b when 
the columns of A are orthonormal. 

25. Describe all least-squares solutions of the system 


x -\- y = 2 
x y = 4 


26. [M] Example 3 in Section 4.8 displayed a low-pass linear 
filter that changed a signal {y^} into {y^+i} and changed a 
higher-frequency signal {w/c} into the zero signal, where 
yic = cos(jtk/4) and Wk = cos(37tA:/ 4). The following cal¬ 
culations will design a filter with approximately those prop¬ 
erties. The filter equation is 

aoyk +2 + aiy k -\-i + a 2 y k = Zk for all k (8) 

Because the signals are periodic, with period 8, it suffices 
to study equation (8) for k = 0,... ,7. The action on the 
two signals described above translates into two sets of eight 
equations, shown below: 


k = 0 
k = l 



B+i 

.7 

0 

-.7 

-1 

-.7 

0 


y/c 

l 

.7 

0 

-.7 

-1 

-.7 

0 


a 0 

a 2 


Jife+i 

.7' 
0 
-.7 
-1 
-.7 
0 


^+2 W^fe+1 Wk 



1" 


"0" 

•7 


0 

0 




0 

.7 


ao 


0 

-1 


a\ 

— 

0 

.7 


_a 2 _ 


0 

0 


0 

•7 


0 


Write an equation Ax = b, where ^4 is a 16x3 matrix 
formed from the two coefficient matrices above and where b 
in R 16 is formed from the two right sides of the equations. 
Find a 。， a\, and 的 given by the least-squares solution of 
Ax = b. (The .7 in the data above was used as an approx¬ 
imation for V2/2, to illustrate how a typical computation 
in an applied problem might proceed. If .707 were used 
instead, the resulting filter coefficients would agree to at least 
seven decimal places with \/2/4,1/2, and V2/4, the values 
produced by exact arithmetic calculations.) 
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SOLUTIONS TO PRACTICE PROBLEMS 


1. First, compute 



1 

1 

1 一 

"1 - 

3 - 

-3" 


■3 

9 

0" 

a t a = 

-3 

5 

7 

1 

5 

1 

= 

9 

83 

28 


_-3 

1 

2 

1 

7 

2 


0 

28 

14 


" 1 

1 

1 一 

5" 



-3" 




A T b = 

-3 

5 

7 

-3 

= 

— 

65 





-3 

1 

2 

-5 


- 

28 





Next, row reduce the augmented matrix for the normal equations, A T Ax = A T b: 


■3 

9 

0 

-3" 


"1 

3 

0 

-1" 


"1 

0 

-3/2 

2" 

9 

83 

28 

-65 

〜 

0 

56 

28 

-56 

〜.••〜 

0 

1 

1/2 

-1 

0 

28 

14 

-28 


0 

28 

14 

-28 


0 

0 

0 

0 


The general least-squares solution is X\ = 2 參 X3, X 2 = with X3 free. 

For one specific solution, take X 3 = 0 (for example), and get 


2 

x = -1 

0 

To find the least-squares error, compute 



"1 -3 

-3" 

2" 


5" 

b = Ax = 

1 5 

1 7 

1 

2 

-1 

0 

= 

-3 

-5 


It turns out that b = b, so ||b — b|| = 0. The least-squares error is zero because b 
happens to be in Col A. 

2. If b is orthogonal to the columns of A, then the projection of b onto the column space 
of A is 0. In this case, a least-squares solution x of Ax = b satisfies Ax = 0. 


6.6 APPLICATIONS TO LINEAR MODELS 

A common task in science and engineering is to analyze and understand relationships 
among several quantities that vary. This section describes a variety of situations in 
which data are used to build or verify a formula that predicts the value of one variable 
as a function of other variables. In each case, the problem will amount to solving a 
least-squares problem. 

For easy application of the discussion to real problems that you may encounter later 
in your career, we choose notation that is commonly used in the statistical analysis of 
scientific and engineering data. Instead of Ax = b, we write Xp = y and refer to X as 

the design matrix, p as the parameter vector, and y as the observation vector. 

Least-Squares Lines 

The simplest relation between two variables x and y is the linear equation 
y = p x x. 1 Experimental data often produce points (xi, ji),..., y n ) that, 


This notation is commonly used for least-squares lines instead of j = mx + b. 
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when graphed, seem to lie close to a line. We want to determine the parameters 
and that make the line as “close” to the points as possible. 

Suppose po and ) 61 are fixed, and consider the line y = po in Fig. 1. 

Corresponding to each data point (xj, yj) there is a point (xj , + p\Xj) on the line 

with the same x-coordinate. We call yj the observed value of y and the 

predicted j-value (determined by the line). The difference between an observed y- 
value and a predicted j-value is called a residual. 



FIGURE 1 Fitting a line to experimental data. 


There are several ways to measure how “close” the line is to the data. The usual 
choice (primarily because the mathematical calculations are simple) is to add the squares 
of the residuals. The least-squares line is the line y = po ^\X that minimizes the 
sum of the squares of the residuals. This line is also called a line of regression of y 
on x, because any errors in the data are assumed to be only in the j-coordinates. The 
coefficients po, p\ of the line are called (linear) regression coefficients. 2 

If the data points were on the line, the parameters and would satisfy the 
equations 


Predicted 
y -value 

Observed 
j-value 

Po + ^\X\ 

=yi 

Po + PlX2 

= J2 

Po + P\X n 

= y n 


We can write this system as 

1 X\ 




'ji" 


Xp = y, where X = 

1 X 2 

， p = 

'Po' 

-fh. 

， y = 

yi 

(1) 


_ 1 x n 




_yn_ 



Of course, if the data points don’t lie on a line, then there are no parameters po, for 
which the predicted j-values 'mXfi equal the observed y-values in y, and XP = y has 
no solution. This is a least-squares problem, Ax = b, with different notation! 

The square of the distance between the vectors Xp and y is precisely the sum of 
the squares of the residuals. The P that minimizes this sum also minimizes the distance 
between XP and y. Computing the least-squares solution of XP = y is equivalent to 
finding the p that determines the least-squares line in Fig. 1 • 


2 If the measurement errors are in x instead of j, simply interchange the coordinates of the data (xj , yj) 
before plotting the points and computing the regression line. If both coordinates are subject to possible error, 
then you might choose the line that minimizes the sum of the squares of the orthogonal (perpendicular) 
distances from the points to the line. See the Practice Problems for Section 7.5. 
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EXAMPLE 1 Find the equation y = of the least-squares line that best fits 

the data points (2,1) ，（ 5,2) ， (7, 3), and (8, 3). 


SOLUTION Use the x-coordinates of the data to build the design matrix X in (1) and 
the j-coordinates to build the observation vector y: 


X = 




2 



For the least-squares solution of Xfi = y, obtain the normal equations (with the new 
notation): 



That is, compute 


1 2 



The normal equations are 

4 

22" 

'Po' 


9" 


22 

142 



57 


Hence 


22 

142 


A/ 


4 

22' 

-l 

9' 

1 

"142 -22" 

9' 

1 

"24' 


" 2/7 " 

Pi 


22 

142 


57 

= 84 

-22 4 

57 

= 84 

30 


5/14 


Thus the least-squares line has the equation 


See Fig. 2. 


2 5 
7 + u" 


y 

3- 

2 - 


123456789 
FIGURE 2 The least-squares line 




■ 


A common practice before computing a least-squares line is to compute the average 
x of the original x-values and form a new variable x* = x — x. The new x-data are said 
to be in mean-deviation form. In this case, the two columns of the design matrix will 
be orthogonal. Solution of the normal equations is simplified, just as in Example 4 in 
Section 6.5. See Exercises 17 and 18. 
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Describe the linear model that produces a “least-squares fit” of the data by equation (3). 

SOLUTION Equation (3) describes the ideal relationship. Suppose the actual values of 
the parameters are ^2 - Then the coordinates of the first data point (^i, y\) satisfy 

an equation of the form 

Jl = ^0 + Pl x l + ^2 X \ + ^1 

where €\ is the residual error between the observed value y\ and the predicted j-value 
Po + ) 61^1 + p 2 x \ - Each data point determines a similar equation: 


Surface area 
of foliage 

FIGURE 4 

Production of nutrients. 


: Vi = A) + P\X\ + ?>ix\ + €i 

J2 = A) + P\X2 + hx\ + €2 


= A) + P\X n + + € n 


The General Linear Model 

In some applications, it is necessary to fit data points with something other than a straight 
line. In the examples that follow, the matrix equation is still Xfi = y, but the specific 
form of X changes from one problem to the next. Statisticians usually introduce a 
residual vector e, defined by € = y — XP ，and write 

y = xp + e 

Any equation of this form is referred to as a linear model. Once X and y are determined, 
the goal is to minimize the length of 6 , which amounts to finding a least-squares solution 
of = y. In each case, the least-squares solution 0 is a solution of the normal 
equations 

X T XP = X T y 



FIGURE 3 

Average cost curve. 


Least-Squares Fitting of Other Curves 

When data points (x\, y \),..., (x n ,y n ) on a scatter plot do not lie close to any line, it 
may be appropriate to postulate some other functional relationship between x and y. 
The next two examples show how to fit data by curves that have the general form 

y = PoMx) + + ••• + Pkfk(x) ( 2 ) 

where fo,..., fk are known functions and po,..., are parameters that must be 
determined. As we will see, equation (2) describes a linear model because it is linear in 
the unknown parameters. 

For a particular value of x, (2) gives a predicted, or “fitted，” value of y. The 
difference between the observed value and the predicted value is the residual. The 
parameters ^o,..., pk must be determined so as to minimize the sum of the squares 
of the residuals. 

EXAMPLE 2 Suppose data points (xi, ji),..., (x n , y n ) appear to lie along some 
sort of parabola instead of a straight line. For instance, if the x-coordinate denotes the 
production level for a company, and y denotes the average cost per unit of operating at 
a level of x units per day, then a typical average cost curve looks like a parabola that 
opens upward (Fig. 3). In ecology, a parabolic curve that opens downward is used to 
model the net primary production of nutrients in a plant, as a function of the surface 
area of the foliage (Fig. 4). Suppose we wish to approximate the data by an equation of 
the form 


y = ^0 + 


(3) 



uoponpojd 
XJBma.^z 
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FIGURE 5 

Data points along a cubic curve. 


It is a simple matter to write this system of equations in the form y = XP e. To find 
X, inspect the first few rows of the system and look for the pattern. 


"Jl" 

J2 


1 X\ 

1 X 2 

•^r 

A 


Pi 

+ 

"^1 " 

^2 



_ 1 x n 



J2. 


_ 

y = 

X 


P 

+ 

€ 


EXAMPLE 3 If data points tend to follow a pattern such as in Fig. 5， then an 
appropriate model might be an equation of the form 

y = p 2 x 2 + ^3X 3 

Such data, for instance, could come from a company’s total costs, as a function of the 
level of production. Describe the linear model that gives a least-squares fit of this type 
to data (xi,ji),...,(x„ ， 3 ； w ). 


SOLUTION By an analysis similar to that in Example 2, we obtain 


Observation 

vector 


Design 

matrix 


Parameter 

vector 


Residual 

vector 


y = 

ji 

少 2 

， x = 





X\ 

2 3 n 

xf x\ 


'/60' 


"^1 " 

X 2 

A X 2 

， P = 

Pi 

， € = 

^2 

X n 



fh 


- € n _ 


■ 


Multiple Regression 

Suppose an experiment involves two independent variables—say, u and u—and one 
dependent variable, y. A simple equation for predicting y from u and v has the form 

j + jSiw + p 2 v (4) 

A more general prediction equation might have the form 

y = ^0 + 02"+ "I" 04 ㈣ + (5) 

This equation is used in geology, for instance, to model erosion surfaces, glacial cirques, 
soil pH, and other quantities. In such cases, the least-squares fit is called a trend surface. 

Equations (4) and (5) both lead to a linear model because they are linear in the 
unknown parameters (even though u and v are multiplied). In general, a linear model 
will arise whenever y is to be predicted by an equation of the form 

y = Pofo(u, v) + Plfl(u,v) + ■■■ + Pkfk(u,v) 

with fo,..., fk any sort of known functions and po, • • •, Pk unknown weights. 

EXAMPLE 4 In geography, local models of terrain are constructed from data 
(wi, ^ 1 ,^ 1 ), . • •, (w«, v n , y n )，where Uj ， Vj, and yj are latitude, longitude, and altitude, 
respectively. Describe the linear model based on (4) that gives a least-squares fit to such 
data. The solution is called the least-squares plane. See Fig. 6. 
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FIGURE 6 A least-squares plane. 


SOLUTION We expect the data to satisfy the following equations: 

Jl = ^0 + Pl u l + ^ 2^1 + ^1 
yi = Po + + ^2V2 + ^2 


yn = ^0 + 

This system has the matrix form y = XP e, where 


Observation 

vector 


Design 

matrix 


Parameter 

vector 



~ y \" 


1 U\ 

V \" 


"^i " 


yi 


1 U 2 

V2 


Po 


^2 

y = 

, x = 



，卜 

Pi 

, e = 








h 




_y n _ 


_ 1 u n 

Vn_ 


_^n_ 


Residual 

vector 


■ 


Example 4 shows that the linear model for multiple regression has the same abstract 
form as the model for the simple regression in the earlier examples. Linear algebra gives 
us the power to understand the general principle behind all the linear models. Once X 
is defined properly, the normal equations for P have the same matrix form, no matter 
how many variables are involved. Thus, for any linear model where X T X is invertible, 
the least-squares $ is given by (X T X)~ l X T y. 

Further Reading 

Ferguson, J., Introduction to Linear Algebra in Geology (New York: Chapman & Hall, 
1994). 

Krumbein, W. C” and F. A. Graybill, An Introduction to Statistical Models in Geology 
(New York: McGraw-Hill, 1965). 

Legendre, P., and L. Legendre, Numerical Ecology (Amsterdam: Elsevier, 1998). 

Unwin, David J., An Introduction to Trend Surface Analysis, Concepts and Techniques 
in Modern Geography, No. 5 (Norwich, England: Geo Books, 1975). 


The Geometry of a 
- 」 Linear Model 6-19 


PRACTICE PROBLEM 


When the monthly sales of a product are subject to seasonal fluctuations, a curve that 
approximates the sales data might have the form 

J + Pi sin (2jtx/l2) 

where x is the time in months. The term )6o + gives the basic sales trend, and 
the sine term reflects the seasonal changes in sales. Give the design matrix and the 
parameter vector for the linear model that leads to a least-squares fit of the equation 
above. Assume the data are (x\,y\),... ,(x n , y n ). 
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6.6 EXERCISES 

In Exercises 1-4, find the equation y = - of the least- 

squares line that best fits the given data points. 

1. (0,1), (1,1), (2,2), (3,2) 

2. (1,0), (2,1), (4,2), (5,3) 

3. (-1,0), (0,1), (1,2), (2,4) 

4. (2,3), (3,2), (5,1), (6,0) 

5. Let X be the design matrix used to find the least-squares line 
to fit data (^i, ji),..., (x„, j„). Use a theorem in Section 6.5 
to show that the normal equations have a unique solution 
if and only if the data include at least two data points with 
different ^-coordinates. 

6. Let X be the design matrix in Example 2 corresponding to 
a least-squares fit of a parabola to data (xi, y {),..., (x n , y n ). 
Suppose X\, X 2 , and X 3 are distinct. Explain why there is only 
one parabola that fits the data best, in a least-squares sense. 
(See Exercise 5.) 

7. A certain experiment produces the data (1,1.8), (2,2.7), 
(3,3.4), (4, 3.8), (5,3.9). Describe the model that produces 
a least-squares fit of these points by a function of the form 

y = jS lX + p 2 x 2 

Such a function might arise, for example, as the revenue from 
the sale of x units of a product, when the amount offered for 
sale affects the price to be set for the product. 

a. Give the design matrix, the observation vector, and the 
unknown parameter vector. 

b. [M] Find the associated least-squares curve for the data. 

8. A simple curve that often makes a good model for the vari¬ 
able costs of a company, as a function of the sales level x, 
has the form y = P\x + P 2 X 2 + P 3 X 3 . There is no constant 
term because fixed costs are not included. 

a. Give the design matrix and the parameter vector for the 
linear model that leads to a least-squares fit of the equa¬ 
tion above, with data (xi, ji), ..., (x n ,y n ). 

b. [M] Find the least-squares curve of the form above to fit 
the data (4,1.58), (6,2.08), (8,2.5), (10,2.8), (12,3.1), 
(14,3.4), (16,3.8), and (18,4.32), with values in thou¬ 
sands. If possible, produce a graph that shows the data 
points and the graph of the cubic approximation. 

9. A certain experiment produces the data (1,7.9), (2,5.4), and 
(3, —.9). Describe the model that produces a least-squares fit 
of these points by a function of the form 

y = A cos x + 5 sin x 

10. Suppose radioactive substances A and B have decay con¬ 
stants of .02 and .07, respectively. If a mixture of these two 
substances at time / = 0 contains M A grams of A and M B 
grams of B, then a model for the total amount y of the mixture 
present at time t is 

y = M A e~ mt + M B e~ 01t (6) 


Suppose the initial amounts M A and M B are unknown, 
but a scientist is able to measure the total amounts 
present at several times and records the following points 
U (10,21.34), (11,20.68), (12,20.05), (14,18.87), 
and (15,18.30). 

a. Describe a linear model that can be used to estimate 
and Mq. 

b. [M] Find the least-squares curve based on (6). 



Halley’s Comet last appeared in 1986 and will reappear in 
2061. 


11. [M] According to Kepler’s first law, a comet should have 
an elliptic, parabolic, or hyperbolic orbit (with gravitational 
attractions from the planets ignored). In suitable polar coor¬ 
dinates, the position (r, of a comet satisfies an equation of 
the form 

r = ^ + e{r - cos 汐） 

where 0 is a constant and e is the eccentricity of the orbit, 
with 0 < e < 1 for an ellipse, e = 1 for a parabola, and e > 1 
for a hyperbola. Suppose observations of a newly discovered 
comet provide the data below. Determine the type of orbit, 
and predict where the comet will be when 1 } = 4.6 (radians). 3 



.88 

1.10 

1.42 

1.77 

2.14 

r 

3.00 

2.30 

1.65 

1.25 

1.01 


12. [M] A healthy child’s systolic blood pressure p (in millime¬ 
ters of mercury) and weight w (in pounds) are approximately 
related by the equation 

Po Pilnw = p 

Use the following experimental data to estimate the systolic 
blood pressure of a healthy child weighing 100 pounds. 


3 The basic idea of least-squares fitting of data is due to K. F. Gauss 
(and, independently, to A. Legendre), whose initial rise to fame occurred 
in 1801 when he used the method to determine the path of the asteroid 
Ceres. Forty days after the asteroid was discovered, it disappeared behind 
the sun. Gauss predicted it would appear ten months later and gave its 
location. The accuracy of the prediction astonished the European scientific 
community. 
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w 

44 

61 

81 

113 

131 

In w 

3.78 

4.11 

4.39 

4.73 

4.88 

P 

91 

98 

103 

110 

112 


13. [M] To measure the takeoff performance of an airplane, the 
horizontal position of the plane was measured every second, 
from / = 0 to / = 12. The positions (in feet) were: 0, 8.8, 
29.9, 62.0, 104.7, 159.1 ， 222.0, 294.5, 380.4, 471.1, 571.7, 
686.8, and 809.2. 

a. Find the least-squares cubic curve y = Pq-\- P\t + 
^ 2 t 2 + )^ 3 ? 3 for these data. 

b. Use the result of part (a) to estimate the velocity of the 
plane when t = 4.5 seconds. 

14. Let x = -(xi -\ - h x n ) and y = -(yH - 1- y n )- 

n n 

Show that the least-squares line for the data 

(xi, ji), ..., (x n , y n ) must pass through (x, y). That is, show 
that J and y satisfy the linear equation y = p 0 -[Hint: 
Derive this equation from the vector equation y = XP 
Denote the first column of X by 1. Use the fact that the 
residual vector € is orthogonal to the column space of X and 
hence is orthogonal to 1.] 


17. a. Rewrite the data in Example 1 with new ^-coordinates 

in mean deviation form. Let X be the associated design 
matrix. Why are the columns of X orthogonal? 
b. Write the normal equations for the data in part (a), and 
solve them to find the least-squares line, y = + Pix*, 

where x* = x — 5.5. 

18. Suppose the ^-coordinates of the data (^i, ji),..., (x n , y n ) 
are in mean deviation form, so that Xi = 0. Show that if 
X is the design matrix for the least-squares line in this case, 
then X T X is a diagonal matrix. 

Exercises 19 and 20 involve a design matrix X with two or more 
columns and a least-squares solution P of y = Xp. Consider the 
following numbers. 

(i) || Z 為 || 2 —the sum of the squares of the "regression term.” 
Denote this number by SS(R). 

(ii) \\y — Xp || 2 —the sum of the squares for error term. Denote 
this number by SS(E). 

(iii) ||y|| 2 —the <4 totar , sum of the squares of the ^-values. Denote 
this number by SS(T). 


Given data for a least-squares problem, (^i, ji),..., (x n ,y n ), the 
following abbreviations are helpful: 

E^ = E-=i^-. E^ 2 = E-=i^ 

= ELiJm J2 x y = E 『 =i 砂 

The normal equations for a least-squares line y = + Pix may 

be written in the form 

nPo + PiT, x = EJ ⑺ 

= J2 x y 

15. Derive the normal equations (7) from the matrix form given 
in this section. 

16. Use a matrix inverse to solve the system of equations in (7) 
and thereby obtain formulas for Po and pi that appear in many 
statistics texts. 


Every statistics text that discusses regression and the linear model 
y = Xp € introduces these numbers, though terminology and 
notation vary somewhat. To simplify matters, assume that the 
mean of the y - values is zero. In this case, SS(T) is proportional 
to what is called the variance of the set of j-values. 

19. Justify the equation SS(T) = SS(R) 4 - SS(E). [Hint: Use a 
theorem, and explain why the hypotheses of the theorem are 
satisfied.] This equation is extremely important in statistics, 
both in regression theory and in the analysis of variance. 

20. Show that ||X)8|| 2 = p T X T y. [Hint: Rewrite the left side 
and use the fact that $ satisfies the normal equations.] This 
formula for SS(R) is used in statistics. From this and from 
Exercise 19, obtain the standard formula for SS(E): 

SS(E) = y T y-p T X T y 


SOLUTION TO PRACTICE PROBLEM 


: y 



Sales trend with seasonal 
fluctuations. 


Construct X and P so that the kth row of Xp is the predicted j-value that corresponds 
to the data point (x^, 外 ) ， namely, 

)So + P\Xk + )02 sin(2jtXk/l2) 

It should be clear that 



1 x\ sin(2;rxi/12) 


)0_ 

X = 

1 x n sin (2 丌 12) 

， p = 
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6.7 INNER PRODUCT SPACES 


Notions of length, distance, and orthogonality are often important in applications 
involving a vector space. For R n , these concepts were based on the properties of the 
inner product listed in Theorem 1 of Section 6.1. For other spaces, we need analogues of 
the inner product with the same properties. The conclusions of Theorem 1 now become 
axioms in the following definition. 


An inner product on a vector space K is a function that, to each pair of vectors 
u and v in V, associates a real number (u, v) and satisfies the following axioms, 
for all u, y, w in K and all scalars c: 

1. (u,v) = (v,u) 

2. (u + v, w) = (u, w) + (v, w) 

3. (cu, v) = c(u, v) 

4. (u, u) > 0 and (u, u) = 0 if and only if u = 0 

A vector space with an inner product is called an inner product space. 


The vector space R /7 with the standard inner product is an inner product space, and 
nearly everything discussed in this chapter for carries over to inner product spaces. 
The examples in this section and the next lay the foundation for a variety of applications 
treated in courses in engineering, physics, mathematics, and statistics. 

EXAMPLE 1 Fix any two positive numbers—say, 4 and 5—and for vectors 
u = (Mi, U 2 ) and v = (v\, V 2 ) in M 2 , set 

(u,v) = 4u\V\ + 5u 2 v 2 (1) 

Show that equation (1) defines an inner product. 

SOLUTION Certainly Axiom 1 is satisfied, because (u,y) = 4u\V\ + 5 U 2 V 2 = 
4v\U\ + 5V2U2 = (v, u). If w = (w ； i, W 2 ), then 

(u + y, w) = 4(wi + + 5(u 2 + v 2 )w 2 


=AU\W\ + 5U2W2 + 4V\U)1 + 5 V 2 W 2 


= (u, w) + (y, w) 

This verifies Axiom 2. For Axiom 3, compute 

(cu, y) = 4(cu\)vi + 5(cu2)v2 = c(4u\V\ + 51 / 2 ^ 2 ) = c(u, y) 

For Axiom 4, note that (u,u) = \u\ + 5u\ > 0, and 4i/j + 5u\ = 0 only ifu\ = U 2 = 

0, that is, if u = 0. Also, (0,0) = 0. So (1) defines an inner product on R 2 . ■ 

Inner products similar to (1) can be defined on W\ They arise naturally in 

connection with “weighted least-squares” problems, in which weights are assigned to 
the various entries in the sum for the inner product in such a way that more importance 
is given to the more reliable measurements. 

From now on, when an inner product space involves polynomials or other functions, 
we will write the functions in the familiar way, rather than use the boldface type for 
vectors. Nevertheless, it is important to remember that each function is a vector when 
it is treated as an element of a vector space. 
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EXAMPLE 2 Lett 0 ， ... ， t n be distinct real numbers. For p and ^ in P„, define 

(p,q) = p(to)q(to) + p{t])q(h) + ■•■ + p(t„)q(t„) ( 2 ) 

Inner product Axioms 1-3 are readily checked. For Axiom 4, note that 

(p,p) = LpOo)] 2 + [p{h)f + ■•• + [p{t n )f > 0 
Also, 〈 0,0〉 = 0. (The boldface zero here denotes the zero polynomial, the zero vector 
in P„.) If (p, p) = 0, then p must vanish at « + 1 points: to, … ， t n . This is possible 
only if p is the zero polynomial, because the degree of p is less than n Thus (2) 
defines an inner product on P„. ■ 

EXAMPLE 3 Let V be P 2 , with the inner product from Example 2, where to = 0, 
t\ = and t 2 = 1. Let p(t) = I2t 2 and q(t) = 2t — Compute (p, q) and (q, q). 

SOLUTION 

{p^q) = ^(0)^(0) + p{\)q{\) + 切⑴ 

=(0)(-1)+ (3)(0) +(12)(1) = 12 

“ 〉 =_2+ 硝 ] 2 + 剛 2 

=(_ 1) 2 + ( 0) 2 + ( l ) 2 = 2 ■ 

Lengths, Distances, and Orthogonality 

Let V be an inner product space, with the inner product denoted by (u, y). Just as in 
W 1 , we define the length, or norm, of a vector y to be the scalar 

llvll = V> ， v 〉 

Equivalently, ||v|| 2 = (y, y). (This definition makes sense because (v, v) > 0, but the 
definition does not say that (v, v) is a “sum of squares,” because v need not be an element 
om') 

A unit vector is one whose length is 1. The distance between u and v is ||u — v||. 
Vectors u and y are orthogonal if (u, y) = 0. 

EXAMPLE 4 Let P 2 have the inner product (2) of Example 3. Compute the lengths 
of the vectors p(t) = \2t 2 and q(t) = 2t — 

SOLUTION 

\\p\\ 2 = (p,p) = [pm 2 + [p(\)] 2 + [pm 2 
= 0 + [3] 2 + [12] 2 = 153 
M| = 7153 

From Example 3, {q,q) = 2. Hence ||^|| = V2. ■ 

The Gram-Schmidt Process 

The existence of orthogonal bases for finite-dimensional subspaces of an inner product 
space can be established by the Gram-Schmidt process, just as in W 1 . Certain orthogo¬ 
nal bases that arise frequently in applications can be constructed by this process. 

The orthogonal projection of a vector onto a subspace W with an orthogonal basis 
can be constructed as usual. The projection does not depend on the choice of orthogonal 
basis, and it has the properties described in the Orthogonal Decomposition Theorem and 
the Best Approximation Theorem. 
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EXAMPLE 5 Let V be P 4 with the inner product in Example 2, involving evaluation 
of polynomials at —2, 一 1 ， 0 ， 1， and 2, and view P 2 as a subspace of V. Produce an 
orthogonal basis for F 2 by applying the Gram-Schmidt process to the polynomials 1, t, 
and t 2 . 

SOLUTION The inner product depends only on the values of a polynomial at —2 ， … ， 2, 
so we list the values of each polynomial as a vector in R 5 , underneath the name of the 
polynomial : 1 

Polynomial: 


Vector of values: 


The inner product of two polynomials in V equals the (standard) inner product of their 
corresponding vectors in M 5 . Observe that t is orthogonal to the constant function 1. So 
take po(t) = 1 and p\(t) = t. For pi, use the vectors in M 5 to compute the projection 
of t 2 onto Span{j?o, Pi}- 

(t\ Po ) = (t\ 1) =4+1+0+1+4=10 
{po, Po) = 5 

(? 2 , p\) = (^ 2 , t) = —8 + (— 1 ) + 0 + l + 8 = 0 
The orthogonal projection of t 2 onto Span{1, t} is + 0p\. Thus 

P2(t) = t 2 - 2p 0 (t) = t 2 -2 
An orthogonal basis for the subspace F 2 of V is: 


1 


-2 


4 

1 


-1 


1 

1 

5 

0 


0 

1 


1 


1 

1 


2 


4 


Polynomial: 


Vector of values: 


Po Pi Pi 


1 


-2 


2 

1 


-1 


-1 

1 

1 

0 


-2 

1 


1 


-1 

1 


2 


2 


(3) 


■ 


Best Approximation in Inner Product Spaces 

A common problem in applied mathematics involves a vector space V whose elements 
are functions. The problem is to approximate a function / in K by a function g from a 
specified subspace W of V. The “closeness” of the approximation of / depends on the 
way 11/ — is defined. We will consider only the case in which the distance between 
f and g is determined by an inner product. In this case, the best approximation to f by 
functions in W is the orthogonal projection of / onto the subspace W. 

EXAMPLE 6 Let V be P 4 with the inner product in Example 5, and let po, p\, 
and P 2 be the orthogonal basis found in Example 5 for the subspace P 2 . Find the best 
approximation to p{t) = 5 — by polynomials in P 2 . 


1 Each polynomial in P 4 is uniquely determined by its value at the five numbers —2,..., 2. In fact, the 
correspondence between p and its vector of values is an isomorphism, that is, a one-to-one mapping onto 
R 5 that preserves linear combinations. 
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SOLUTION The values of po, Pi, and P 2 at the numbers —2, —1， 0, 1， and 2 are listed 
in R 5 vectors in (3) above. The corresponding values for p are —3, 9/2, 5, 9/2, and —3. 
Compute 

(P^Po) = 8, {p,P\) = 0, {p,Pi) = -31 

{po, Po) = 5, {P2,Pi) = 14 


Then the best approximation in F to /7 by polynomials in P 2 is 


P = Pi*oj P2 = 


{p ， Po] , (P'Pl) , {p,Pl) 
{P0 ， P0) P0 {Pl,P\) Pl {P2 ， P2、 Pl 


= \Po + if-^2 = I - yj(? 2 -2). 

This polynomial is the closest to p of all polynomials in P 2 , when the distance between 
polynomials is measured only at —2, —1, 0, 1， and 2. See Fig. 1. ■ 



FIGURE 1 

The polynomials po, p\, and 仍 in Examples 5 and 6 belong to a class of polynomi¬ 
als that are referred to in statistics as orthogonal polynomials. 2 The orthogonality refers 
to the type of inner product described in Example 2. 



■, V 

Ilvll/ 

llv-proj w v|| 

_q 

proj 州 v 

'Hproj^vll 


FIGURE 2 

The hypotenuse is the longest side. 


Two Inequalities 

Given a vector y in an inner product space V and given a finite-dimensional subspace 
W, we may apply the Pythagorean Theorem to the orthogonal decomposition of y with 
respect to W and obtain 

l|v|| 2 = II projjy v|| 2 + ||v- proj^vll 2 

See Fig. 2. In particular, this shows that the norm of the projection of v onto W does not 
exceed the norm of v itself. This simple observation leads to the following important 
inequality. 


THEOREM 16 


The Cauchy-Schwarz Inequality 
For all u, y in V, 

l(u,v)|<||u|| ||v|| ⑷ 


2 See Statistics and Experimental Design in Engineering and the Physical Sciences, 2nd ed., by Norman 
L. Johnson and Fred C. Leone (New York: John Wiley & Sons, 1977). Tables there list “Orthogonal 
Polynomials,” which are simply the values of the polynomial at numbers such as —2, —1, 0, 1, and 2. 










380 CHAPTER 6 Orthogonality and Least Squares 


THEOREM 17 



0 Null u 

FIGURE 3 

The lengths of the sides of a 
triangle. 


PROOF If u = 0, then both sides of (4) are zero, and hence the inequality is true in this 
case. (See Practice Problem 1.) If u ^ 0, let W be the subspace spanned by u. Recall 
that ||cu|| = \c\ ||u|| for any scalar c. Thus 


II proj 『 v|| 


(v ， u) | 

<u ， u 〉 


l 〈 v ， u〉| 

Ku ， U〉| 


Hull 


K^ ||U| 卜 ㈣ 


Hull 2 


Hull 


Since || proj^vH < ||v||, we have ! 二:〉 ! < ||v||, which gives (4). 


■ 


The Cauchy-Schwarz inequality is useful in many branches of mathematics. A few 
simple applications are presented in the exercises. Our main need for this inequality here 
is to prove another fundamental inequality involving norms of vectors. See Fig. 3. 


The Triangle Inequality 
For all u, v in V, 

l|u + v|| < Hull + ||y|| 


PROOF ||u +v|| 2 = (u + v,u +v> = (u,u> + 2(u,v) + (v,v> 

< ||u|| 2 + 2|(u,v)| + ||v|| 2 

< ||u|| 2 + 2||u|| ||v|| + ||v|| 2 Cauchy-Schwarz 

=(Hull + ||v||) 2 

The triangle inequality follows immediately by taking square roots of both sides. ■ 


An Inner Product for C[a, b] (Calculus required) 

Probably the most widely used inner product space for applications is the vector space 
C[a,b] of all continuous functions on an interval a < t < b, with an inner product that 
we will describe. 

We begin by considering a polynomial p and any integer n larger than or equal 
to the degree of p. Then p is in P„, and we may compute a “length” for p using the 
inner product of Example 2 involving evaluation at /? + 1 points in [a,b]. However, 
this length of p captures the behavior at only those n -\- \ points. Since p is in P„ for 
all large n, we could use a much larger n, with many more points for the “evaluation” 
inner product. See Fig. 4. 



FIGURE 4 Using different numbers of evaluation points in [a , b] to compute 

ih 2 . 
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Let us partition [a, b] into « + 1 subintervals of length At = (b — a)/(n + 1)，and 
let to,... ,t n be arbitrary points in these subintervals. 


a t 0 


M A? K 


+ 


+ 


tn b 


If n is large, the inner product on P„ determined by ?o,..., t n will tend to give a large 
value to (p, p), so we scale it down and divide by « + 1. Observe that l/(« + 1)= 
At/(b — a), and define 


(p ， q 、 


n + 




b — a 


^2p(tj)q(tj)At 


Now, let n increase without bound. Since polynomials p and g are continuous functions, 
the expression in brackets is a Riemann sum that approaches a definite integral, and we 
are led to consider the average value of p(t)q(t) on the interval [a, b]: 


b — a 


p{t)q{t) dt 


This quantity is defined for polynomials of any degree (in fact, for all continuous 
functions), and it has all the properties of an inner product, as the next example shows. 
The scale factor \/(b — a) is inessential and is often omitted for simplicity. 


EXAMPLE 7 For /, g in C[a, b], set 

(/ g) = [ Rt)g{t)dt (5) 

J a 

Show that (5) defines an inner product on C[a, b]. 

SOLUTION Inner product Axioms 1-3 follow from elementary properties of definite 
integrals. For Axiom 4, observe that 

(//>= [ b [f(t)] 2 dt>0 
J a 

The function [/(^)] 2 is continuous and nonnegative on [a, b]. If the definite integral of 
[/(^)] 2 is zero, then [/(^)] 2 must be identically zero on [a, b], by a theorem in advanced 
calculus, in which case / is the zero function. Thus (/’ /〉 =0 implies that / is the 
zero function on [a, b]. So (5) defines an inner product on C[a, b]. ■ 

EXAMPLE 8 Let V be the space C [0,1] with the inner product of Example 7, and 
let W be the subspace spanned by the polynomials p\(t) = 1, /72(0 = 2^ — 1, and 
P 3 (t) = lit 1 . Use the Gram — Schmidt process to find an orthogonal basis for W. 

SOLUTION Let q\ = p \, and compute 

r l 1 

iP2,q\) = / (2r — 1)(1) J/ = (t 2 - 1) =0 
Jo o 
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So P 2 is already orthogonal to q\, and we can take q 2 = P 2 - For the projection of /?3 
onto W 2 = Span {❼，奶 }，compute 


4 


Then 


and 


(^3,?l> = 

f Ut 2 -\dt = 4f 3 


Jo 

疒 1 

1 

iquqi) = 

1 1 • \ dt = t 

= 1 


Jo 

广 1 

0 

(^ 3 ,? 2 > = 

/ I2t 2 (2t -l)dt = 

Jo 

广 1 1 


1 (It — V) 2 dt 

Jo 

= 

6 


. (Pfh) , {P3,qi) 4 2 .,, 

pr0J "^ 3 = J^) qi + = i qi + Tp qi = 4?1 + 6 " 2 


q 3 = p 3 - projV 2 p 3 = p 3 - 4q x - 6q 2 

As a function, q^(t) = \2t 2 — 4 — 6{2t — 1) = \2t 2 — \2t + 2. The orthogonal basis 
for the subspace W is {q\,q 2 , ^ 3 }- ■ 


PRACTICE PROBLEMS 

Use the inner product axioms to verify the following statements. 

1 . (v, 0 ) = ( 0 ,y) = 0 . 

2 . (u, y + w) = (u, v) + (u, w). 


6.7 EXERCISES 


1. Let R 2 have the inner product of Example 1, and let 
x = (1,1) and y = (5,-1). 

a. Find ||x||, ||y||,and |{x,y)| 2 . 

b. Describe all vectors (zi, Z 2 ) that are orthogonal to y. 

2. Let R 2 have the inner product of Example 1. Show that 
the Cauchy-Schwarz inequality holds for x = (3, —2) and 
y = (—2,1). [Suggestion: Study |(x,y)| 2 .] 

Exercises 3-8 refer to P 2 with the inner product given by evalua¬ 
tion at —1, 0, and 1. (See Example 2.) 

3. Compute {p,q), where p(t) = 4 + r, q(t) = 5 — At 2 . 

4. Compute {p,q), where p{t) = 3t — t 2 , q(t) = 3 + It 2 . 

5. Compute ||p|| and ||^||, for p and q in Exercise 3. 

6 . Compute ||p|| and ||^||, for p and q in Exercise 4. 

7. Compute the orthogonal projection of q onto the subspace 
spanned by p, for p and q in Exercise 3. 

8 . Compute the orthogonal projection of q onto the subspace 
spanned by p, for p and q in Exercise 4. 


9. Let P 3 have the inner product given by evaluation at —3, — 1, 
1, and 3. Let po(t) = 1, p\(t) = t, and pi(t) = t 2 . 

a. Compute the orthogonal projection of P 2 onto the sub¬ 
space spanned by po and p\. 

b. Find a polynomial q that is orthogonal to po and 
Pi, such that {po, p\,q} is an orthogonal basis for 
Span {po, P\, Pi}- Scale the polynomial q so that its 
vector of values at (-3,-1,1,3) is (1, -1,-1,1). 

10. Let P 3 have the inner product as in Exercise 9, with _Pi ， 
and q the polynomials described there. Find the best approx¬ 
imation to p(t) = t 3 by polynomials in Span {p^, pi,q}. 

11. Let po, pi, and /?2 be the orthogonal polynomials described 
in Example 5, where the inner product on P 4 is given by 
evaluation at —2, —1, 0, 1, and 2. Find the orthogonal 
projection of onto Span {p 0 , p u p 2 }. 

12. Find a polynomial such that {p^, pi, P 2 , P 3 } (see Ex¬ 
ercise 11) is an orthogonal basis for the subspace P 3 of 
P4. Scale the polynomial so that its vector of values is 
(-1,2,0, -2,1). 
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13. Let A be any invertible n 乂 n matrix. Show that for u, v in 
M”，the formula (u,v) = (^4u). (ylv) = (^4u) r (^4v) defines 
an inner product on R n . 


14. Let r be a one-to-one linear transformation from a vector 
space V into R n . Show that for u, v in F, the formula 
(u,y) = T (u) • T (y) defines an inner product on V. 


Use the inner product axioms and other results of this section to 
verify the statements in Exercises 15-18. 


15. (u, c\) = c(u, v) for all scalars c. 

16. If {u, v} is an orthonormal set in V, then ||u — v|| 

17. (u,v) = i||u + y|| 2 -i||u-v|| 2 . 

18. ||u + y|| 2 +||u-v|| 2 = 2||u|| 2 + 2||v|| 2 . 


19. Given a > 0 and b > 0, let u : 


\fb 


and y : 


V2. 


Vb 

y/a 


Use the Cauchy-Schwarz inequality to compare the geomet¬ 
ric mean \fab with the arithmetic mean (a -\- b)/2. 


20. Let u : 


and v : 


inequality to show that 
[a -\-b\ 2 a 1 + b 2 


Use the Cauchy-Schwarz 


2 


Exercises 21-24 refer to V = C[0,1], with the inner product 

given by an integral, as in Example 7. 

21. Compute (f, g), where f(t)= 1 — 3/ 2 and g(t) = t — t 3 . 

22. Compute (/, g), where f(t) = 5t — 3 and g(t) = t 3 — t 2 . 

23. Compute ||/|| for / in Exercise 21. 

24. Compute \\g\\ for g in Exercise 22. 

25. Let V be the space C[—1,1] with the inner product of Exam¬ 
ple 7. Find an orthogonal basis for the subspace spanned by 
the polynomials 1, t, and t 1 . The polynomials in this basis 
are called Legendre polynomials. 

26. Let V be the space C [—2,2] with the inner product of Exam¬ 
ple 7. Find an orthogonal basis for the subspace spanned by 
the polynomials 1 , t, and t 2 . 

27. [M] Let P 4 have the inner product as in Example 5, and let 
po, pi, P2 be the orthogonal polynomials from that exam¬ 
ple. Using your matrix program, apply the Gram-Schmidt 
process to the set {/?o. Pi, Pi, t 3 , t 4 } to create an orthogonal 
basis for P4. 

28. [M] Let V be the space C[0, 2jt] with the inner prod¬ 
uct of Example 7. Use the Gram-Schmidt process to 
create an orthogonal basis for the subspace spanned by 
{1, cos t, cos 2 1 , cos 3 /}. Use a matrix program or computa¬ 
tional program to compute the appropriate definite integrals. 


SOLUTIONS TO PRACTICE PROBLEMS 


1. By Axiom 1, (v,0) = (0, v). Then (0,v) = (0v,v) = 0(v,y), by Axiom 3, so 
(0,v) = 0. 

2. By Axioms 1, 2, and then 1 again, (u, v + w) = (v + w,u) = (v, u) + (w,u)= 
(u, v) + (u, w). 


6.8 APPLICATIONS OF INNER PRODUCT SPACES 

The examples in this section suggest how the inner product spaces defined in Section 6.7 
arise in practical problems. The first example is connected with the massive least- 
squares problem of updating the North American Datum, described in the chapter’s 
introductory example. 

Weighted Least-Squares 

Let y be a vector of n observations, yi,... ,y n , and suppose we wish to approximate y by 
a vector y that belongs to some specified subspace of (In Section 6.5, y was written 
as Ax so that y was in the column space of 儿 ） Denote the entries in y by yi,..., 
Then the sum of the squares for error, or SS(E), in approximating y by y is 

SS(E) = (j! — jh) 2 + … + ( 少 „ - j)„) 2 ⑴ 

This is simply ||y — y|| 2 , using the standard length in R n . 
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Now suppose the measurements that produced the entries in y are not equally 
reliable. (This was the case for the North American Datum, since measurements were 
made over a period of 140 years.) As another example, the entries in y might be 
computed from various samples of measurements, with unequal sample sizes.) Then 
it becomes appropriate to weight the squared errors in (1) in such a way that more 
importance is assigned to the more reliable measurements. 1 If the weights are denoted 
by w； j,..., xju^, then the weighted sum of the squares for error is 

Weighted SS(E) = w^( yi - jj) 2 + -■■ + w 2 n (y n - y n f (2) 

This is the square of the length of y — y, where the length is derived from an inner 
product analogous to that in Example 1 in Section 6.7, namely, 

(x, y) = wjxiyi H - + w 2 n x n y n 

It is sometimes convenient to transform a weighted least-squares problem into an 
equivalent ordinary least-squares problem. Let W be the diagonal matrix with (positive) 
w\,... ,w n on its diagonal, so that 

W\ 0 

0 W2 

Wy =. 

_ 0 

with a similar expression for Wy. Observe that the y th term in (2) can be written as 

w 2 j(yj - 9i) 2 = (wjyj - Wjyj) 2 

It follows that the weighted SS(E) in (2) is the square of the ordinary length in W l of 
Wy — Wy, which we write as || Wy — W^y|| 2 . 

Now suppose the approximating vector y is to be constructed from the columns of 
a matrix A. Then we seek an x that makes Ax = y as close to y as possible. However, 
the measure of closeness is the weighted error, 

ll^y-^yll 2 = ||^y-i^4i|| 2 

Thus x is the (ordinary) least-squares solution of the equation 

WAx = Wy 

The normal equation for the least-squares solution is 

(WA) t WAx = (WA) T Wy 


0 一 


~ y\ " 


X 



yi 

= 

w 2 yi 

U^n _ 


_yn_ 


_ ^ n y n _ 


EXAMPLE 1 Find the least-squares line j that best fits the data 

(—2, 3), (—1, 5), (0, 5), (1,4)，and (2, 3). Suppose the errors in measuring the j-values 
of the last two data points are greater than for the other points. Weight these data half 
as much as the rest of the data. 


•Note for readers with a background in statistics: Suppose the errors in measuring the yi are independent 
random variables with means equal to zero and variances of cr ^,..., Then the appropriate weights in (2) 
are wf = 1/a?. The larger the variance of the error, the smaller the weight. 
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: v 



FIGURE 1 

Weighted and ordinary 
least-squares lines. 


SOLUTION As in Section 6.6, write X for the matrix A and P for the vector x, and 
obtain 


X 


For a weighting matrix, choose W with diagonal entries 2, 2 ， 2, 1, and 1. Left- 
multiplication by W scales the rows of X and y: 


WX 


"1 -2_ 



"3" 

1 -1 



5 

1 0 

1 1 

， p = 


， y = 

5 

4 

1 2 



3 


"2 

-4" 


6" 

2 

-2 


10 

2 

0 

， Wy = 

10 

1 

1 


4 

1 

2 


3 


For the normal equation, compute 
(WX) T WX -- 

and solve 


14 -9 

-9 25 


and (WX) T Wy 


59 

-34 


■14 -9" 



59" 

-9 25 _ 



-34 


The solution of the normal equation is (to two significant digits) = 4.3 and = .20. 
The desired line is 

y = 4.3 + .20 x 

In contrast, the ordinary least-squares line for these data is 

y = 4.0 — AOx 

Both lines are displayed in Fig. 1. ■ 


Trend Analysis of Data 

Let / represent an unknown function whose values are known (perhaps only approx¬ 
imately) at to,... ,t n . If there is a “linear trend” in the data f(to ),..., f(t n ), then 
we might expect to approximate the values of / by a function of the form 爪 + Pit. 
If there is a “quadratic trend” to the data, then we would try a function of the form 
卢 0 + + ^ 2 ? 2 . This was discussed in Section 6.6, from a different point of view. 

In some statistical problems, it is important to be able to separate the linear trend 
from the quadratic trend (and possibly cubic or higher-order trends). For instance, 
suppose engineers are analyzing the performance of a new car, and f(t) represents 
the distance between the car at time t and some reference point. If the car is traveling 
at constant velocity, then the graph of f{t) should be a straight line whose slope is the 
car’s velocity. If the gas pedal is suddenly pressed to the floor, the graph of f(t) will 
change to include a quadratic term and possibly a cubic term (due to the acceleration). 
To analyze the ability of the car to pass another car, for example, engineers may want 
to separate the quadratic and cubic components from the linear term. 

If the function is approximated by a curve of the form y = p x t + 如 2 , the 
coefficient 卢 2 may not give the desired information about the quadratic trend in the data, 
because it may not be “independent” in a statistical sense from the other . To make 
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: y 



2 


— I~~I~~I - 1~~I~~^ 

-2 2 

FIGURE 2 

Approximation by a quadratic 
trend function. 


what is known as a trend analysis of the data, we introduce an inner product on the 
space P„ analogous to that given in Example 2 in Section 6.7. For p, q in P„, define 

(p,q) = p(to)q(t 0 ) + ••• + p{t n )q(t n ) 

In practice, statisticians seldom need to consider trends in data of degree higher than 
cubic or quartic. So let po, pu P 2 , P 3 denote an orthogonal basis of the subspace P 3 of 
P w , obtained by applying the Gram-Schmidt process to the polynomials 1, t, t 1 , and t 3 . 
By Supplementary Exercise 11 in Chapter 2, there is a polynomial g in P„ whose values 
at ^o,..., t n coincide with those of the unknown function /. Let g be the orthogonal 
projection (with respect to the given inner product) of g onto F 3 , say, 


g = c 0 po + cipi + C 2 P 2 + C 3 P 3 


Then g is called a cubic trend function, and Co,, C 3 are the trend coefficients of 
the data. The coefficient C\ measures the linear trend, C 2 the quadratic trend, and C 3 the 
cubic trend. It turns out that if the data have certain properties, these coefficients are 
statistically independent. 

Since / > 0 , …，仍 are orthogonal, the trend coefficients may be computed one at 
a time, independently of one another. (Recall that = (g, Pi)/(pi, Pi)-) We can 
ignore 773 and C 3 if we want only the quadratic trend. And if, for example, we needed 
to determine the quartic trend, we would have to find (via Gram-Schmidt) only a 
polynomial in P4 that is orthogonal to P3 and compute (g, p^)/(p 4 , ^4). 


EXAMPLE 2 The simplest and most common use of trend analysis occurs when the 
points to,... ,t n can be adjusted so that they are evenly spaced and sum to zero. Fit a 
quadratic trend function to the data (—2,3) ，（一 1 ， 5) ，（ 0, 5) ，（ 1,4)，and (2, 3). 

SOLUTION The 卜 coordinates are suitably scaled to use the orthogonal polynomials 
found in Example 5 of Section 6.7: 


Polynomial: 

Po 

Pi 

Pi 

Data: 


1 


-2 


2 


3 


1 


-1 


-1 


5 

Vector of values: 

1 

5 

0 


-2 


5 


1 


1 


-1 


4 


1 


2 


2 


3 


The calculations involve only these vectors, not the specific formulas for the orthogonal 
polynomials. The best approximation to the data by polynomials in F 2 is the orthogonal 
projection given by 


» _ (g ， Po) , {g ， Pi) , {g^P2) 
p 一 （ Po ， Po、 PQ {p\,Pi) Pl (P2, Pi) Pl 

=fPo-ToPi-T4P2 
and 

p(t) = 4 - .If _ ,5(t 2 -2) (3) 

Since the coefficient of P 2 is not extremely small, it would be reasonable to conclude 
that the trend is at least quadratic. This is confirmed by the graph in Fig. 2. ■ 
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Fourier Series (Calculus required) 

Continuous functions are often approximated by linear combinations of sine and cosine 
functions. For instance, a continuous function might represent a sound wave, an electric 
signal of some type, or the movement of a vibrating mechanical system. 

For simplicity, we consider functions on 0 < ^ < 2jc. It turns out that any function 
in C [0, 2jt] can be approximated as closely as desired by a function of the form 
ao 

—a\ cos t a n cosnt + b\ sin t b n sinnt (4) 

for a sufficiently large value of n. The function (4) is called a trigonometric poly¬ 
nomial. If a n and b n are not both zero, the polynomial is said to be of order n. The 
connection between trigonometric polynomials and other functions in C [0, 2jt] depends 
on the fact that for any n > l, the set 

{1, cos t, cos 2t ,..., cos nt, sin t, sin2^, … ， sinnt} (5) 

is orthogonal with respect to the inner product 

r>2n 

{f ， g)=l f(0g(t)dt ⑹ 

Jo 

This orthogonality is verified as in the following example and in Exercises 5 and 6. 


EXAMPLE 3 Let C[0, 2ji] have the inner product (6), and let m and n be unequal 
positive integers. Show that cos mt and cos nt are orthogonal. 


SOLUTION Use a trigonometric identity. When m ★ n ， 


(cos mt, cos nt) 



cos mt cos nt dt 


2n 

[cos(mt + nt) + cos(mt — nt)] dt 

1 sin(mf -\- nt) sm{mt — nt) 

2 [ m + n m — n 




■ 


Let W be the subspace of C[0, 2n] spanned by the functions in (5). Given / 
in C[0, 2n], the best approximation to / by functions in W is called the wth-order 
Fourier approximation to / on [0, 2 tt]. Since the functions in (5) are orthogonal, 
the best approximation is given by the orthogonal projection onto W. In this case, the 
coefficients ak and bk in (4) are called the Fourier coefficients of /. The standard 
formula for an orthogonal projection shows that 

ak = {f, COS kt) bk = (f, sin kt) ^ > ^ 

(cos kt, cos kt)' (sin sin/:^) 1 


Exercise 7 asks you to show that (cos/^,cos^f 〉= 丌 and (sin sinkt) = n. Thus 




2tt 

f(t) cos kt dt. 


bk = 


7t 



2jt 

f(t) sinkt dt 


⑺ 


The coefficient of the (constant) function 1 in the orthogonal projection is 


(/’ 1〉 — i 广 " 八 , _ 1 
(M> m ' ldt= 2 


r*2n 


■ 丌 Jo 


/(0cos(0'0 dt 


ao_ 

y 


where ao is defined by (7) for k = 0. This explains why the constant term in (4) is 
written as ao / 2. 
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EXAMPLE 4 Find the «th-order Fourier approximation to the function f{t) = t on 
the interval [0, 2n]. 


SOLUTION Compute 


ao 

~2 


2 7t 


/*2tt 

1 

1 

2tu 

1 t dt - 

1 

-t 2 


Jo 

2n 

2 

0 


Tt 


and for k > 0, using integration by parts, 


i f 2n , ^ i 
ak = — I t cos kt at =— 

丌 《/o 丌 

1 f 2n 1 

bk = — I t sin A:? dt =— 

^ Jo 丌 


— coskt + — sin kt 


—rsinkt — — coskt 
k 1 k 


2n 


0 

2tt 


2 

k 


Thus the 7?th-order Fourier approximation of f(t) = t is 


7t . 


2 sin f — sin 


sin 


2 

—sin W 
n 


Figure 3 shows the third- and fourth-order Fourier approximations of /. 


■ 





FIGURE 3 Fourier approximations of the function f{t) = t. 

The norm of the difference between / and a Fourier approximation is called the 
mean square error in the approximation. (The term mean refers to the fact that 
the norm is determined by an integral.) It can be shown that the mean square error 
approaches zero as the order of the Fourier approximation increases. For this reason, it 
is common to write 


CIq k y 

f(t) = - h / {a m cos mt b m sinmt) 

2 m=\ 

This expression for f(t) is called the Fourier series for / on [0, 2jt]. The term 
a m cos mt, for example, is the projection of f onto the one-dimensional subspace 
spanned by cos mt. 

PRACTICE PROBLEMS 

1. Let q\(t) = 1, qi{t) = t, and q 认 t) = 3t 2 — 4. Verify that {q\,q 2 , ^ 3 } is an orthog¬ 
onal set in C [—2,2] with the inner product of Example 7 in Section 6.7 (integration 
from —2 to 2). 

2. Find the first-order and third-order Fourier approximations to 

f{t) = 3 — 2 sin ^ + 5 sin 2^ — 6 cos 2t 
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6.8 EXERCISES 

1. Find the least-squares line y = p 0 -\r PiX that best fits the 
data (—2,0) ， (—1,0), (0,2), (1,4), and (2,4), assuming that 
the first and last data points are less reliable. Weight them 
half as much as the three interior points. 

2. Suppose 5 out of 25 data points in a weighted least-squares 
problem have a j-measurement that is less reliable than the 
others, and they are to be weighted half as much as the other 
20 points. One method is to weight the 20 points by a factor 
of 1 and the other 5 by a factor of A second method is 
to weight the 20 points by a factor of 2 and the other 5 by a 
factor of 1. Do the two methods produce different results? 
Explain. 

3. Fit a cubic trend function to the data in Example 2. The 
orthogonal cubic polynomial is = |? 3 — ^t. 

4. To make a trend analysis of six evenly spaced data points, one 
can use orthogonal polynomials with respect to evaluation at 
the points t = —5, —3, —1, 1 ， 3, and 5. 

a. Show that the first three orthogonal polynomials are 

Po(t) = 1 ， pi(t) = t, and p 2 (t) = |/ 2 - f 

(The polynomial p2 has been scaled so that its values at 
the evaluation points are small integers.) 

b. Fit a quadratic trend function to the data 
(-5,1) ， (-3,1) ， (-1,4), (1,4), (3,6), (5,8) 

In Exercises 5-14, the space is C[0, 2jt] with the inner product 

⑹. 

5. Show that sin mt and sin nt are orthogonal when m _ n. 

6. Show that sin mt and cos nt are orthogonal for all positive 
integers m and n. 

7. Show that || coskt \\ 2 = 7t and || sinkt \\ 2 = n for A: > 0. 

8. Find the third-order Fourier approximation to f(t) = t — \. 


9. Find the third-order Fourier approximation to f{t)= 
2,jt — t. 

10. Find the third-order Fourier approximation to the square 
wave function, f{t) = 1 for 0 < t < n and f(t) = —1 for 
jr < t < 2jt. 

11. Find the third-order Fourier approximation to sin 2 1, without 
performing any integration calculations. 

12. Find the third-order Fourier approximation to cos 3 1 , without 
performing any integration calculations. 

13. Explain why a Fourier coefficient of the sum of two functions 
is the sum of the corresponding Fourier coefficients of the 
two functions. 

14. Suppose the first few Fourier coefficients of some function 
/in C[0, 2n] are a 。， ci\, and b\, Z? 2 , 厶 3 . Which of the 
following trigonometric polynomials is closer to /? Defend 
your answer. 

ao . 

g(t) = — a\ cos t -\- a 2 cos 2t + b\ sin t 

ao . . 

h(t) = — + ai cos t -\- a 2 cos 2t + b\ sin t b 2 sin 

15. [M] Refer to the data in Exercise 13 in Section 6 . 6 , con¬ 
cerning the takeoff performance of an airplane. Suppose the 
possible measurement errors become greater as the speed of 
the airplane increases, and let W be the diagonal weighting 
matrix whose diagonal entries are 1, 1, 1, .9, .9, . 8 , .7, . 6 , .5, 
.4, .3, .2, and .1. Find the cubic curve that fits the data with 
minimum weighted least-squares error, and use it to estimate 
the velocity of the plane when t = 4.5 seconds. 

16. [M] Let /4 and fs be the fourth-order and fifth-order Fourier 
approximations in C [ 0,2 丌 ] to the square wave function in 
Exercise 10. Produce separate graphs of and fs on the 
interval [ 0 , 2jv], and produce a graph of fs on [—2 丌, 2jv]. 


SG The Linearity of an Orthogonal Projection 6-25 


SOLUTIONS TO PRACTICE PROBLEMS 


1. Compute 


0 


-2 


(^i-?2> = J ^-tdt = -t 2 

f 2 2 

(q\,q3) = l-(3t 2 -4)dt = (t 3 -4t) 

J-2 -2 

〈仍，奶 〉 = J (3t 2 — 4) dt = — 2t 2 
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First- and third-order 
approximations to f(t). 


2. The third-order Fourier approximation to / is the best approximation in C[0, 2jt] 
to / by functions (vectors) in the subspace spanned by 1, cos cos 2t, cos 3t, 
sinf, sin 2t, and sin 3t. But f is obviously in this subspace, so / is its own best 
approximation: 

f(t) = 3 — 2 sin ^ + 5 sin 2^ — 6 cos 2t 


For the first-order approximation, the closest function to / in the subspace W = 
Span{ 1, cos t, sin t} is 3 — 2 sin /. The other two terms in the formula for f(t) are 
orthogonal to the functions in W, so they contribute nothing to the integrals that 
give the Fourier coefficients for a first-order approximation. 


CHAPTER 6 SUPPLEMENTARY EXERCISES 


1. The following statements refer to vectors in R n (or E m ) with 

the standard inner product. Mark each statement True or 

False. Justify each answer. 

a. The length of every vector is a positive number. 

b. A vector y and its negative —y have equal lengths. 

c. The distance between u and y is ||u — y||. 

d. If r is any scalar, then ||ry|| = r ||y||. 

e. If two vectors are orthogonal, they are linearly indepen¬ 
dent. 

f. If x is orthogonal to both u and y, then x must be 
orthogonal to u — v. 

g. If ||u + v || 2 = ||u|| 2 + ||v|| 2 , then u and v are orthogonal. 

h. If ||u — v || 2 = ||u|| 2 + ||y|| 2 , then u and y are orthogonal. 

i. The orthogonal projection of y onto u is a scalar multiple 
of y. 

j. If a vector y coincides with its orthogonal projection onto 
a subspace W, then y is in W. 

k. The set of all vectors in R” orthogonal to one fixed vector 
is a subspace of R n . 

l. If is a subspace of R”，then W and W 1 - have no 
vectors in common. 

m. If {vi ， \ 2 , V 3 } is an orthogonal set and if c\ , C 2 , and C 3 are 
scalars, then {ciVi, C 2 V 2 , C 3 V 3 } is an orthogonal set. 

n. If a matrix U has orthonormal columns, then UU T = I. 

o. A square matrix with orthogonal columns is an orthogo¬ 
nal matrix. 

p. If a square matrix has orthonormal columns, then it also 
has orthonormal rows. 

q. If VK is a subspace, then || proj^ y || 2 + ||y — proj^ v || 2 = 

llvll 2 . 


r. A least-squares solution of Ax = b is the vector Ax in 

Col A closest to b, so that ||b — || < ||b — ^4x|| for 

all x. 

s. The normal equations for a least-squares solution of 
Ax = b are given by x = {A T A)~ l A T h. 

2. Let {vi,..., y^} be an orthonormal set. Verify the following 
equality by induction, beginning with p = 2. If x = ciVi + 
- h c p \ p , then 

l|x || 2 = kil 2 + ••• + |c p | 2 

3. Let {vi,..., y^} be an orthonormal set in R”. Verify the 
following inequality, called Bessel’s inequality, which is true 
for each x in E” ： 

I|X|| 2 > Ix-v^^Ix-VjI 2 + ••• + |x.v f ,| 2 

4. Let U be an n x n orthogonal matrix. Show that if 
{vi,..., y n } is an orthonormal basis for then so is 

5. Show that if an /1 x n matrix U satisfies (Ux). (Uy) = x.y 
for all x and y in then U is an orthogonal matrix. 

6. Show that if U is an orthogonal matrix, then any real eigen¬ 
value of U must be 士 1 ■ 

7. A Householder matrix, or an elementary reflector, has the 
form Q = I — 2uu t where u is a unit vector. (See Exer¬ 
cise 13 in the Supplementary Exercises for Chapter 2.) Show 
that Q is an orthogonal matrix. (Elementary reflectors are of¬ 
ten used in computer programs to produce a QR factorization 
of a matrix A. If A has linearly independent columns, then 
left-multiplication by a sequence of elementary reflectors can 
produce an upper triangular matrix.) 
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8 . Let T : R M —^ be a linear transformation that preserves 

lengths; that is, ||T(x)|| = ||x|| for all x in R”. 

a. Show that T also preserves orthogonality; that is, 
r(x)*T(y) = 0 whenever x-y = 0. 

b. Show that the standard matrix of 7" is an orthogonal 
matrix. 


Exercises 15 and 16 concern the (real) Schur factorization of an 
n 乂 n matrix A in the form d = URU T ， where U is an orthogonal 
matrix and Ris ann x n upper triangular matrix. 1 

15. Show that if A admits a (real) Schur factorization, A = 
URU T ， then A has n real eigenvalues, counting multiplic¬ 
ities. 


9. Let u and y be linearly independent vectors in that are 
not orthogonal. Describe how to find the best approximation 
to z in by vectors of the form xiu + X 2 \ without first 
constructing an orthogonal basis for Span {u, v}. 

10. Suppose the columns of A are linearly independent. Deter¬ 
mine what happens to the least-squares solution x of Ax = b 
when b is replaced by cb for some nonzero scalar c. 

11. If a, b, and c are distinct numbers, then the following 
system is inconsistent because the graphs of the equations 
are parallel planes. Show that the set of all least-squares 
solutions of the system is precisely the plane whose equation 
is x — 2y 5z = (a b c)/3. 

x — 2y 5z = a 
x — 2y 5z = b 
x — 2y 5z = c 

12. Consider the problem of finding an eigenvalue of an n x « 
matrix A when an approximate eigenvector y is known. 
Since v is not exactly correct, the equation 


16. Let Abe an n x n matrix with n real eigenvalues, counting 
multiplicities, denoted by Ai,..., X n . It can be shown that 
A admits a (real) Schur factorization. Parts (a) and (b) show 
the key ideas in the proof. The rest of the proof amounts to 
repeating (a) and (b) for successively smaller matrices, and 
then piecing together the results. 

a. Let Ui be a unit eigenvector corresponding to X\, let 
U2, … ， u„ be any other vectors such that {ui ， ...,u„} 
is an orthonormal basis for R”，and then let U = 
[Ui U2 ••- u„ ]. Show that the first column of 
U T AU is Aiei, where ei is the first column of the n y. n 
identity matrix. 

b. Part (a) implies that U T AU has the form shown below. 
Explain why the eigenvalues of A i are 义 2 , ... ， 久 n. [Hint: 
See the Supplementary Exercises for Chapter 5.] 

氺 氺 氺 氺 _ 



= Ay (1) 

will probably not have a solution. However, A can be 
estimated by a least-squares solution when (1) is viewed 
properly. Think of y as an n x 1 matrix V, think of X as 
a vector in R 1 , and denote the vector A\ by the symbol b. 
Then (1) becomes b = AV, which may also be written as 
VX = b. Find the least-squares solution of this system of n 
equations in the one unknown A, and write this solution using 
the original symbols. The resulting estimate for A is called a 
Rayleigh quotient. See Exercises 11 and 12 in Section 5.8. 

13. Use the steps below to prove the following relations among 
the four fundamental subspaces determined by an m x « 
matrix A. 

Row^ = (Nul 4) 丄 ， CoM = (Nul 火 7 )丄 


a. Show that Row A is contained in (Nul ^4)-*-. (Show that if 
x is in Row A, then x is orthogonal to every u in Nul A) 

b. Suppose rank A = r. Find dim Nul A and dim (Nul d)i, 
and then deduce from part (a) that Row A = (Nul 乂 ) 丄 . 
[Hint: Study the exercises for Section 6.3.] 

c. Explain why Col A = (Nul 乂 7 ) 丄. 

14. Explain why an equation Ax = b has a solution if and only 
if b is orthogonal to all solutions of the equation A T x = 0. 


[M] When the right side of an equation Ax = b is changed 
slightly—say, to Ax = b + Ab for some vector Ab—the solution 
changes from x to x + Ax, where Ax satisfies ^4(Ax) = Ab. 
The quotient ||Ab||/||b|| is called the relative change in b (or 
the relative error in b when Ab represents possible error in the 
entries of b). The relative change in the solution is || Ax||/||x||. 
When A is invertible, the condition number of A, written as 
cond(i4), produces a bound on how large the relative change in 
x can be: 


II Ax|| 

IWI 


< cond(y4)- 


llAb|| 

l|b|| 


( 2 ) 


In Exercises 17-20, solve Ax = b and v4(Ax) = Ab, and show 
that the inequality (2) holds in each case. (See the discussion of 
ill-conditioned matrices in Exercises 41-43 in Section 2.3.) 


17. 

A = 

4.5 

1.6 

3.1 

1.1 

,b = 

19.249 
_ 6.843 _ 

,Ab = 

.001 

-.003 

18. 

A = 

"4.5 

1.6 

3.1 _ 
1.1 

,b = 

'.500 
-1.407 

,Ab = 

.001 

-.003 


1 If complex numbers are allowed, every n x n matrix A admits a 
(complex) Schur factorization, A = URU~ l , where R is upper triangular 
and U~ l is the conjugate transpose of U• This very useful fact is 
discussed in Matrix Analysis, by Roger A. Horn and Charles R. Johnson 
(Cambridge: Cambridge University Press, 1985), pp. 79-100. 



















392 CHAPTER 6 Orthogonality and Least Squares 


19. 


A = 


7 

—6 

-4 r 


".100" 


7 

—6 

-4 r 


" 4.230" 

-5 

1 

0 -2 

,b = 

2.888 

, 20. A = 

-5 

1 

0 -2 

,b = 

-11.043 

10 

11 

7 -3 

一 1.404 

10 

11 

7 -3 

49.991 

19 

9 

7 1 


1.462 


19 

9 

7 1 


69.536 


Ab = KT 4 


".49" 


'.27" 

-1.28 

Ab = 10— 4 

7.76 

5.78 

-3.77 

8.04 


3.93 














Symmetric Matrices 
and Quadratic Forms 



INTRODUCTORY EXAMPLE 

Multichannel Image Processing 

Around the world in little more than 80 minutes, the two 
Landsat satellites streak silently across the sky in near 
polar orbits, recording images of terrain and coastline, in 
swaths 185 kilometers wide. Every 16 days, each satellite 
passes over almost every square kilometer of the earth’s 
surface, so any location can be monitored every 8 days. 

The Landsat images are useful for many purposes. 
Developers and urban planners use them to study the rate 
and direction of urban growth, industrial development, and 
other changes in land usage. Rural countries can analyze 
soil moisture, classify the vegetation in remote regions, and 
locate inland lakes and streams. Governments can detect 
and assess damage from natural disasters, such as forest 
fires, lava flows, floods, and hurricanes. Environmental 
agencies can identify pollution from smokestacks and 
measure water temperatures in lakes and rivers near power 
plants. 

Sensors aboard the satellite acquire seven simul¬ 
taneous images of any region on earth to be studied. The 
sensors record energy from separate wavelength bands— 
three in the visible light spectrum and four in infrared and 
thermal bands. Each image is digitized and stored as a 
rectangular array of numbers, each number indicating the 
signal intensity at a corresponding small point (or pixel) 


on the image. Each of the seven images is one channel of 
a multichannel or multispectral image. 

The seven Landsat images of one fixed region typically 
contain much redundant information, since some features 
will appear in several images. Yet other features, because 
of their color or temperature, may reflect light that is 
recorded by only one or two sensors. One goal of 
multichannel image processing is to view the data in a 
way that extracts information better than studying each 
image separately. 

Principal component analysis is an effective way 
to suppress redundant information and provide in only 
one or two composite images most of the information 
from the initial data. Roughly speaking, the goal is to 
find a special linear combination of the images, that is, 
a list of weights that at each pixel combine all seven 
corresponding image values into one new value. The 
weights are chosen in a way that makes the range of light 
intensities—the scene variance— in the composite image 
(called the first principal component) greater than that in 
any of the original images. Additional component images 
can also be constructed, by criteria that will be explained 
in Section 7.5. 
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Principal component analysis is illustrated in the 
photos below, taken over Railroad Valley, Nevada. Images 
from three Lands at spectral bands are shown in (a)-(c). 
The total information in the three bands is rearranged in 
the three principal component images in (d)-(f). The first 
component (d) displays (or “explains” ） 93.5% of the scene 
variance present in the initial data. In this way, the three- 
channel initial data have been reduced to one-channel 


data, with a loss in some sense of only 6.5% of the scene 
variance. 

Earth Satellite Corporation of Rockville, Maryland, 
which kindly supplied the photos shown here, is 
experimenting with images from 224 separate spectral 
bands. Principal component analysis, essential for such 
massive data sets, typically reduces the data to about 15 
usable principal components. 





(a) Spectral band 1: Visible blue. 


(b) Spectral band 4: Near infrared. 


(c) Spectral band 7: Mid-infrared. 




(d) Principal component 1: 93.5%. 


(e) Principal component 2: 5.3%. 


(f) Principal component 3: 1.2%. 



Symmetric matrices arise more often in applications, in one way or another, than any 
other major class of matrices. The theory is rich and beautiful, depending in an essential 
way on both diagonalization from Chapter 5 and orthogonality from Chapter 6. The 
diagonalization of a symmetric matrix, described in Section 7.1, is the foundation for 
the discussion in Sections 7.2 and 7.3 concerning quadratic forms. Section 7.3, in turn, 
is needed for the final two sections on the singular value decomposition and on the image 
processing described in the introductory example. Throughout the chapter, all vectors 
and matrices have real entries. 
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7.1 DIAGONALIZATION OF SYMMETRIC MATRICES 


A symmetric matrix is a matrix A such that A r = A. Such a matrix is necessarily square. 
Its main diagonal entries are arbitrary, but its other entries occur in pairs—on opposite 
sides of the main diagonal. 


EXAMPLE 1 Of the following matrices, only the first three are symmetric: 



1 A ~ 


0 

-1 

0 " 


a 

b 

c 


Symmetric: 

1 U 

0 —3 

, 

-1 

5 

8 


b 

d 

e 





0 

8 

-7 


c 

e 

f _ 



1 0" 


1 

-4 

0 " 


"5 

4 

3 

2 " 

Nonsymmetric: 

1 — j 

3 0 

, 

-6 

1 

-4 


4 

3 

2 

1 




0 

-6 

1 


3 

2 

1 

0 


To begin the study of symmetric matrices, it is helpful to review the diagonalization 
process of Section 5.3. 


6 -2 -1 

EXAMPLE 2 If possible, diagonalize the matrix A = —2 6—1 

-1 -1 5 

SOLUTION The characteristic equation of A is 

0 = -A 3 + m 2 - 90A + 144 = -(A- 8 )( 久一 6 ) (A - 3) 
Standard calculations produce a basis for each eigenspace: 


A = 8 : vi = 

"-1" 

1 

； A = 6 : \2 = 

"-1 ' 
-1 

; A = 3: V 3 = 

_ r 
1 


0 


2 


1 


These three vectors form a basis for R 3 . In fact, it is easy to check that {vi, V 2 , V 3 } is 
an orthogonal basis for M 3 . Experience from Chapter 6 suggests that an orthonormal 
basis might be useful for calculations, so here are the normalized (unit) eigenvectors. 



■- 1 /V 2 " 


-1/V6 


1/V3 

ui = 

1/V2 

,U 2 = 

-1/V6 

,U 3 = 

1/V3 


_ 0 _ 


2/V6 


1/V3 


Let 


P 


-1/V2 

-1/V6 

1/V3 


"8 0 0" 

1/V2 

-1/V6 

1/V3 

， D = 

0 6 0 

0 

2/V6 

1/V3 


0 0 3 


Then A = PDP~ l , as usual. But this time, since P is square and has orthonormal 
columns, P is an orthogonal matrix, and P~ l is simply P T • (See Section 6.2.) ■ 

Theorem 1 explains why the eigenvectors in Example 2 are orthogonal — they cor¬ 
respond to distinct eigenvalues. 


THEOREM 1 


If A is symmetric, then any two eigenvectors from different eigenspaces are 
orthogonal. 
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THEOREM 2 


PROOF Let Vi and \2 be eigenvectors that correspond to distinct eigenvalues, say, X\ 
and 久 2 . To show that v! • y 2 = 0, compute 

A lVl . y 2 = (Aivi) r y 2 = (^yi) r V 2 

= (y\A T )\2 = v[ {A\ 2 ) 

=y\ ( 久 2 V 2 ) 

=A 2 vfv2 = A 2 Vl - V2 
Hence (又 1 — 又 2 )vi • V 2 = 0. But — X 2 ^ 0, so yj • y 2 = 0. ■ 

The special type of diagonalization in Example 2 is crucial for the theory of sym¬ 
metric matrices. An n x n matrix A is said to be orthogonally diagonalizable if there 
are an orthogonal matrix P (with P~ l = P T ) and a diagonal matrix D such that 

A = PDP T = PDP~ l (1) 

Such a diagonalization requires n linearly independent and orthonormal eigenvec¬ 
tors. When is this possible? If A is orthogonally diagonalizable as in (1), then 

A t = (PDP t ) t = P tt D t P t = PDP t = A 

Thus A is symmetric! Theorem 2 below shows that, conversely, every symmetric matrix 
is orthogonally diagonalizable. The proof is much harder and is omitted; the main idea 
for a proof will be given after Theorem 3. 


Since Vi is an eigenvector 

Since A T = A 

Since V 2 is an eigenvector 


An n x n matrix A is orthogonally diagonalizable if and only if A is sl symmetric 
matrix. 


This theorem is rather amazing, because the work in Chapter 5 would suggest that 
it is usually impossible to tell when a matrix is diagonalizable. But this is not the case 
for symmetric matrices. 

The next example treats a matrix whose eigenvalues are not all distinct. 


3-2 

EXAMPLE 3 Orthogonally diagonalize the matrix A = —2 6 

4 2 

characteristic equation is 

0 = - 久 3 + 12A 2 - 21A -98 = -(A- 7) 2 (A + 2) 


4 

2 , whose 

3 


SOLUTION The usual calculations produce bases for the eigenspaces: 


"1" 

0 

,= 

"-1/2" 

1 

； 久 = 

—2: y 3 = 

-1 

-1/2 

1 


0 



1 


Although Vi and V 2 are linearly independent, they are not orthogonal. Recall from 


Section 6.2 that the projection of \2 onto Vi is 
orthogonal to Vi is 


\ 2 ' Vl 

- Vi, and the component of \2 

Vl* Vl 


V 2 - V! 

'- 1 / 2 " 

-1/2 

_r 


"-1/4" 

Z2 = \2 - Vl = 

1 


0 

= 

1 

Vl* Vl 

0 

2 

1 


1/4 
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THEOREM 3 


Then {vi, Z 2 } is an orthogonal set in the eigenspace for A = 7. (Note that Z 2 is a linear 
combination of the eigenvectors Vi and \ 2 , so Z 2 is in the eigenspace. This construction 
of Z 2 is just the Gram-Schmidt process of Section 6.4.) Since the eigenspace is two- 
dimensional (with basis Vi, V 2 )，the orthogonal set {vi, Z 2 } is an orthogonal basis for the 
eigenspace, by the Basis Theorem. (See Section 2.9 or 4.5.) 

Normalize Vi and Z 2 to obtain the following orthonormal basis for the eigenspace 
for A = 7: 



'1/V2- 


-1/V18 

Ul = 

0 

, U 2 = 

4/V18 


.1/V2. 


1/V18 


An orthonormal basis for the eigenspace for X = —2 is 


1 1 

"-2" 


"-2/3" 

U3 — MO M 2V 3 — 0 

-1 

= 

-1/3 

l|2v 3 || 3 

2 


2/3 


By Theorem 1 ， 113 is orthogonal to the other eigenvectors Ui and U2. Hence {ui, U2,113} 
is an orthonormal set. Let 



"1/V2 -1/V18 -2/3" 


"7 

0 

0 " 

P = [ui u 2 U 3 ]= 

0 4/VI8 —1/3 

, D = 

0 

7 

0 


1/V2 1/VI8 2/3 


0 

0 

-2 


Then P orthogonally diagonalizes A, and A = PDP~ l . ■ 

In Example 3, the eigenvalue 7 has multiplicity two and the eigenspace is two- 
dimensional. This fact is not accidental, as the next theorem shows. 


The Spectral Theorem 

The set of eigenvalues of a matrix A is sometimes called the spectrum of A, and the 
following description of the eigenvalues is called a spectral theorem. 


The Spectral Theorem for Symmetric Matrices 

An n x n symmetric matrix A has the following properties: 

a. A has n real eigenvalues, counting multiplicities. 

b. The dimension of the eigenspace for each eigenvalue A equals the multiplicity 
of A as a root of the characteristic equation. 

c. The eigenspaces are mutually orthogonal, in the sense that eigenvectors 
corresponding to different eigenvalues are orthogonal. 

d. A is orthogonally diagonalizable. 


Part (a) follows from Exercise 24 in Section 5.5. Part (b) follows easily from part 
(d). (See Exercise 31.) Part (c) is Theorem 1. Because of (a), a proof of (d) can be given 
using Exercise 32 and the Schur factorization discussed in Supplementary Exercise 16 
in Chapter 6. The details are omitted. 
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Spectral Decomposition 

Suppose A = PDP~ l , where the columns of P are orthonormal eigenvectors Ui, … ， u„ 
of A and the corresponding eigenvalues Ai,..., are in the diagonal matrix D. Then, 
since P~ l = P T ， 


A =PDP t = [ui 


= [又 lUi 


Using the column-row expansion of a product (Theorem 10 in Section 2.4), we can 
write 

A = A]Uiu[ + X 2 u 2 ul H - h X n u n ul (2) 


u„ ] 


Ai 


r T 

「u ] 


久《叫 2 ] 


0 

又 《 


This representation of A is called a spectral decomposition of A because it breaks 
up A into pieces determined by the spectrum (eigenvalues) of A. Each term in (2) is 
an « x n matrix of rank 1. For example, every column of AiUiuf is a multiple of Ui. 
Furthermore, each matrix u 7 uj is a projection matrix in the sense that for each x in 
W 1 , the vector (u y uj)x is the orthogonal projection of x onto the subspace spanned by 
uy. (See Exercise 35.) 

EXAMPLE 4 Construct a spectral decomposition of the matrix A that has the or¬ 
thogonal diagonalization 


"7 2' 


"2/V5 

-1/V5" 

"8 

0 " 

' 2/V5 

1/V5" 

2 4 


.1/V5 

2/V5. 

_0 

3_ 

.-1/V5 

2/V5. 


SOLUTION Denote the columns of P by Ui and U 2 . Then 

A = 8uiuf + 3 u2U2 


To verify this decomposition of A, compute 



U 2 u[= 


2/V5' 

1/V5 

-1/V5 

2/x/5 


[2/V5 

[-1/V5 


1/V5]= 


4/5 

2/5 


2/5 

1/5 


2/V5]= 


1/5 -2/5 
—2/5 4/5 


and 


8uiuf + 3 u2uJ = 


"32/5 

16/5' 

1 

3/5 

-6/5' 


"7 2" 

_ 16/5 

8/5 _ 

十 

_-6/5 

12/5 _ 


2 4 


=A 
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11 . 


12 . 


2/3 2/3 1/3 

0 \/V5 -2/V5 

v/5/3 -4/V45 -2/V45_ 


•5 .5 — .5 — .5 

-.5 .5 — .5 .5 

-.5 .5 .5 — .5 


Orthogonally diagonalize the matrices in Exercises 13-22, giving 
an orthogonal matrix P and a diagonal matrix D. To save you 


.Verify that 2 is an 
Then orthogonally 


24. Let A 


Verify that Vi and \2 are eigenvectors of A. Then orthogo¬ 
nally diagonalize A. 


Determine which of the matrices in Exercises 1-6 are symmetric. 

-3 


3. 


time, the eigenvalues in Exercises 17-22 are: (17) 5, 2, —2; (18) 
25,3, -50; (19) 7, -2; (20) 13,7, 1; (21)9,5, 1; (22) 2,0. 



_-5 

3_ 


13. 

'3 r 


14. 

" 1 
c 

5' 

i 



"0 

8 

3" 


i j 




丄 


4. 

8 

0 -2 

15. 

■16 - 4 " 

16. 

'-7 

24 " 


3 

-2 

0_ 

-4 


24 


7 



"3 

1 

r 


" 1' 

23. Let A = 

1 

3 

i 

and v = 

1 


1 

1 

3 


1 


eigenvalue of A and y is an eigenvector, 
diagonalize A. 



—6 

2 

0 


1 

2 

1 

2 

5. 

0 

一 6 

2 

6. 

2 

1 

2 

1 


0 

0 

一 6 


1 

2 

1 

2 


Determine which of the matrices in Exercises 7-12 are orthogo¬ 
nal. If orthogonal, find the inverse. 



"1 

1 

3" 



-2 

-36 

0 " 

17. 

1 

3 

1 


18. 

-36 

-23 

0 


_3 

1 

1 _ 



0 

0 




3 

-2 

4 



7 

-4 

4" 



5 

-4 -2 


-2 


1 

-4 

-2 

5 2 

2 2 

,Vl = 

2 

1 

,and y 2 = 

1 

0 


i— NUMERICAL NOTE - 

When A is symmetric and not too large, modern high-performance computer al¬ 
gorithms calculate eigenvalues and eigenvectors with great precision. They apply 
a sequence of similarity transformations to A involving orthogonal matrices. The 
diagonal entries of the transformed matrices converge rapidly to the eigenvalues 
of A. (See the Numerical Notes in Section 5.2.) Using orthogonal matrices 
generally prevents numerical errors from accumulating during the process. When 
A is symmetric, the sequence of orthogonal matrices combines to form an 
orthogonal matrix whose columns are eigenvectors of A. 

A nonsymmetric matrix cannot have a full set of orthogonal eigenvectors, but 
the algorithm still produces fairly accurate eigenvalues. After that, nonorthogonal 
techniques are needed to calculate eigenvectors. 


PRACTICE PROBLEMS 

1. Show that if ^4 is a symmetric matrix, then A 2 is symmetric. 

2. Show that if A is orthogonally diagonalizable, then so is A 2 . 

7.1 EXERCISES 


0 0 2 0 




22 


21 


^^ 

1 / 1 / 


V/2V2 

1 / 1 / 


8 . 


. 6.8 


7. 


9. 
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In Exercises 25 and 26, mark each statement True or False. Justify 
each answer. 

25. a. An n x n matrix that is orthogonally diagonalizable must 
be symmetric. 

b. \i A 7 = A and if vectors u and y satisfy An = 3u and 
A\ = 4v, then u*y = 0. 


a. Given any x in compute Bx and show that Bx is the 
orthogonal projection of x onto u, as described in Section 
6 . 2 . 

b. Show that B is a symmetric matrix and B 2 = B. 

c. Show that u is an eigenvector of B. What is the corre¬ 
sponding eigenvalue? 


c. An n x n symmetric matrix has n distinct real eigenval¬ 
ues. 

d. For a nonzero y in R n , the matrix w T is called a projec¬ 
tion matrix. 

26. a. Every symmetric matrix is orthogonally diagonalizable. 

b. If 5 = PDP T , where P T = P~ l and D is a diagonal 
matrix, then B is 3. symmetric matrix. 

c. An orthogonal matrix is orthogonally diagonalizable. 

d. The dimension of an eigenspace of a symmetric matrix 
equals the multiplicity of the corresponding eigenvalue. 

27. Suppose j is a symmetric n x n matrix and B is any n 乂 m 
matrix. Show that B T AB, B T B, and BB T are symmetric 
matrices. 

28. Show that if 乂 is an « x n symmetric matrix, then (^4x)*y = 
x. (^4y) for all x, y in E n . 


29. Suppose A is invertible and orthogonally diagonalizable. 
Explain why A~ l is also orthogonally diagonalizable. 

30. Suppose A and B are both orthogonally diagonalizable and 
AB = BA. Explain why AB is also orthogonally diagonal¬ 
izable. 

31. Let A = PDP~ l , where P is orthogonal and D is diagonal, 
and let X be an eigenvalue of A of multiplicity k. Then 
A appears k times on the diagonal of D • Explain why the 
dimension of the eigenspace for A is k. 

32. Suppose A = PRP~ l , where P is orthogonal and R is upper 
triangular. Show that if ^4 is symmetric, then R is symmetric 
and hence is actually a diagonal matrix. 

33. Construct a spectral decomposition of A from Example 2. 

34. Construct a spectral decomposition of A from Example 3. 

35. Let u be a unit vector in R”，and let B = uu r . 


36. Let B be an n x n symmetric matrix such that B 2 = B. Any 
such matrix is called a projection matrix (or an orthogonal 
projection matrix). Given any y in R n , let y = By and 

z = y-y 

a. Show that z is orthogonal to y. 

b. Let W be the column space of B . Show that y is the sum 
of a vector in W and a vector in VK 丄 . Why does this prove 
that By is the orthogonal projection of y onto the column 
space of B1 


[M] Orthogonally diagonalize the matrices in Exercises 37-40. 
To practice the methods of this section, do not use an eigenvector 
routine from your matrix program. Instead, use the program to 
find the eigenvalues, and, for each eigenvalue A, find an orthonor¬ 
mal basis for Nul(^4 — XI), as in Examples 2 and 3. 


37. 


38. 


5 

2 

9 

-6" 


2 

5 -6 

9 


9 

-6 

5 

2 


—6 

9 

2 

5_ 


.38 

-.18 


-.06 - 

-.04 

-.18 

.59 


•04 

.12 

-.06 

-.04 


.47 -.12 

-.04 

• 12 

— 

• 12 

.41 


39. 


40. 


•31 

.58 

.08 

.44" 


.58 

-.56 

•44 

-.58 


.08 

.44 

.19 

-.08 


•44 

-.58 

-.08 

.31 _ 


10 

2 

2 

—6 

9 

2 

10 

2 

—6 

9 

2 

2 

10 

—6 

9 

—6 

—6 

-6 

26 

9 

9 

9 

9 

9 

-19 


SOLUTIONS TO PRACTICE PROBLEMS 

1. (A 2 ) T = (AA) t = A T A T , by a property of transposes. By hypothesis, A T = A. So 
{A 2 ) t = AA = A 2 , which shows that A 2 is symmetric. 

2. If A is orthogonally diagonalizable, then A is symmetric, by Theorem 2. By Practice 
Problem 1, ^4 2 is symmetric and hence is orthogonally diagonalizable (Theorem 2). 
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7.2 QUADRATIC FORMS 


Until now, our attention in this text has focused on linear equations, except for the sums 
of squares encountered in Chapter 6 when computing x r x. Such sums and more general 
expressions, called quadratic forms, occur frequently in applications of linear algebra 
to engineering (in design criteria and optimization) and signal processing (as output 
noise power). They also arise, for example, in physics (as potential and kinetic energy), 
differential geometry (as normal curvature of surfaces), economics (as utility functions), 
and statistics (in confidence ellipsoids). Some of the mathematical background for such 
applications flows easily from our work on symmetric matrices. 

A quadratic form on is a function Q defined on W l whose value at a vector x 
in W 1 can be computed by an expression of the form Q(x) = x T Ax, where ^4 is an w x n 
symmetric matrix. The matrix A is called the matrix of the quadratic form. 

The simplest example of a nonzero quadratic form is Q{s) = x T Ix = ||x|| 2 . Ex¬ 
amples 1 and 2 show the connection between any symmetric matrix A and the quadratic 
form x^x. 


EXAMPLE 1 Let: 


X2 


.Compute x r Ax for the following matrices: 


a. A 


b. A 


SOLUTION 

a. x t Ax = [x\ X 2 ] 




[^1 x 2 ] 


4x\ 

3x 2 


4xf + 7>x\. 


b. There are two —2 entries in A. Watch how they enter the calculations. The (1,2)- 
entry in A is in boldface type. 


x r Ax = [xi X 2 





[ 叉 1 


= xi(3xi - 2 x 2 ) + x 2 (-2xi + 7x 2 ) 


^2 


= — 2 X\X 2 — 2 X 2 X\ + lx\ 

= ?>x\ — Ax\X 2 + lx\ 


3xi — 2x2 
— 2xi + 1x2 


■ 


The presence of —Ax\X 2 in the quadratic form in Example 1(b) is due to the —2 
entries off the diagonal in the matrix A. In contrast, the quadratic form associated with 
the diagonal matrix A in Example 1(a) has no X\X 2 cross-product term. 


EXAMPLE 2 For x in R 3 , let Q(x) = 5x\ + 3x| + 2x\ — x \%2 + 8 x 2 X 3 . Write 
this quadratic form as x^x. 

SOLUTION The coefficients of x\, x\ go on the diagonal of A. To make A sym¬ 
metric, the coefficient of X/Xj for i ^ j must be split evenly between the (/, j)- and 
(y, /)-entries in A. The coefficient of X 1 X 3 is 0. It is readily checked that 



5 

- 1/2 

0 一 


Q(x) = x r Ax = [x\ X 2 X 3 ] 

- 1/2 

3 

4 



0 

4 

2 

_又3_ 
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EXAMPLE 3 Let Q(x) = x\ — 8 x 1 X 2 - 5x|. Compute the value of Q (x) for x = 


'-3' 


2 ' 

,and 

r 

1 


-2 

_-3_ 


SOLUTION 

2(-3,1) = (-3)2 - 8(-3)(l)- 5(1 ) 2 = 28 
2(2, —2) = (2 ) 2 — 8 ⑵ (-2) - 5(_2 ) 2 = 16 
2(1，一3) = (l ) 2 — 8(l)(-3) — 5(-3 ) 2 = —20 ■ 

In some cases, quadratic forms are easier to use when they have no cross-product 
terms—that is, when the matrix of the quadratic form is a diagonal matrix. Fortunately, 
the cross-product term can be eliminated by making a suitable change of variable. 


Change of Variable in a Quadratic Form 

If x represents a variable vector in then a change of variable is an equation of the 
form 

x = Py, or equivalently, y = P~ l x (1) 

where P is an invertible matrix and y is a new variable vector in R w . Here y is the 
coordinate vector of x relative to the basis of W l determined by the columns of P. (See 
Section 4.4.) 

If the change of variable (1) is made in a quadratic form x T Ax, then 

x r Ax = (Py) r A(Py) = y T P T APy = y T (P T AP)y (2) 

and the new matrix of the quadratic form is P T AP . Since A is symmetric, Theorem 2 
guarantees that there is an orthogonal matrix P such that P T AP is a diagonal matrix D ， 
and the quadratic form in (2) becomes y T Dy. This is the strategy of the next example. 


EXAMPLE 4 Make a change of variable that transforms the quadratic form in Ex¬ 
ample 3 into a quadratic form with no cross-product term. 


SOLUTION The matrix of the quadratic form in Example 3 is 



The first step is to orthogonally diagonalize A. Its eigenvalues turn out to be A = 3 and 
X = —1. Associated unit eigenvectors are 


A = 3: 


2/V5 

• A — —7 - 

"i/Vs" 

-1/V5 

, 八 — / . 

2/V5 


These vectors are automatically orthogonal (because they correspond to distinct eigen¬ 
values) and so provide an orthonormal basis for R 2 . Let 


2/V5 

1/V5 

n — 

"3 0" 

-1/V5 

2/V5 

， u — 

_0 -1 _ 


Then A = PDP~ X and D = P~ l AP = P T AP ,as pointed out earlier. A suitable change 
of variable is 


x= Py, 


where x = 


A 

X2 


and 


y = 


yi 

yi 
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THEOREM 4 


Then 


x\ — 8 xi ^2 — Sx\ = x r Ax = (Py) r A(Py) 
=y T P T APy = fDy 
= - iy\ 


■ 


To illustrate the meaning of the equality of quadratic forms in Example 4, we can 
compute Q(x) for x = (2, —2) using the new quadratic form. First, since x = Py, 

y = P 一 1 x= P T \ 


2/V5 

-1/V5 

2 " 


6/V5 

1/V5 

2/V5 

-2 


-2/V5 


Hence 


3y 2 x - ly\ = 3(6/V5) 2 - 7(-2/V5) 2 = 3(36/5) - 7(4/5) 
= 80/5 = 16 

This is the value of Q(x) in Example 3 when x = (2, —2). See Fig. 1. 



FIGURE 1 Change of variable in x r Ax. 

Example 4 illustrates the following theorem. The proof of the theorem was essen¬ 
tially given before Example 4. 


The Principal Axes Theorem 

Let A be an n x n symmetric matrix. Then there is an orthogonal change of 
variable, x = Py, that transforms the quadratic form x T Ax into a quadratic form 
y T Dy with no cross-product term. 


The columns of P in the theorem are called the principal axes of the quadratic 
form The vector y is the coordinate vector of x relative to the orthonormal basis 
of given by these principal axes. 

A Geometric View of Principal Axes 

Suppose Q (x) = x^x, where A is an invertible 2x2 symmetric matrix, and let c be a 
constant. It can be shown that the set of all x in R 2 that satisfy 

x T Ax = c 


( 3 ) 
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either corresponds to an ellipse (or circle), a hyperbola, two intersecting lines, or a single 
point, or contains no points at all. If ^4 is a diagonal matrix, the graph is in standard 
position, such as in Fig. 2. If A is not a diagonal matrix, the graph of equation (3) is 






=1, a > b > 0 



a 2 b 2 


ellipse 


hyperbola 


FIGURE 2 An ellipse and a hyperbola in standard position. 


rotated out of standard position, as in Fig. 3. Finding the principal axes (determined 
by the eigenvectors of A) amounts to finding a new coordinate system with respect to 
which the graph is in standard position. 




(a) 5x^~ 4x^2 + 5^2 = 48 (b) - 8 平 2 - 5x^ = 16 

FIGURE 3 An ellipse and a hyperbola not in standard position. 

The hyperbola in Fig. 3(b) is the graph of the equation x T Ax = 16, where A is the 
matrix in Example 4. The positive 3 ;i-axis in Fig. 3(b) is in the direction of the first 
column of the matrix P in Example 4, and the positive j 2 ~axis is in the direction of the 
second column of P. 


EXAMPLE 5 The ellipse in Fig. 3(a) is the graph of the equation 5x\ — Ax\X 2 + 
5x\ = 48. Find a change of variable that removes the cross-product term from the 
equation. 


SOLUTION The matrix of the quadratic form is ^4 = 



.The eigenvalues of 


A turn out to be 3 and 7, with corresponding unit eigenvectors 


Ui = 


" 1 /V 2 " 


"- 1 /V 2 " 

1 /V 2 

， U2 = 

1 /V 2 
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LetP = [ui u 2 ] 


1/\/2 — 1/\/2 

1/V2 1/V2 

change of variable x = Py produces the quadratic form y T Dy 
axes for this change of variable are shown in Fig. 3(a). 


Then P orthogonally diagonalizes A, so the 

+ ly\. The new 

■ 


Classifying Quadratic Forms 

When ^4 is an « x « matrix, the quadratic form Q (x) = x^x is a real-valued function 
with domain W 1 . Figure 4 displays the graphs of four quadratic forms with domain M 2 . 
For each point x = (xi, X2) in the domain of a quadratic form Q, the graph displays the 
point (xi,X2, z) where z = 2( x ). Notice that except at x = 0, the values of Q(x) are 
all positive in Fig. 4(a) and all negative in Fig. 4(d). The horizontal cross-sections of 
the graphs are ellipses in Figs. 4(a) and 4(d) and hyperbolas in Fig. 4(c). 



FIGURE 4 Graphs of quadratic forms. 


The simple 2x2 examples in Fig. 4 illustrate the following definitions. 


DEFINITION A quadratic form Q is: 

a. positive definite if g(x) > 0 for all x ^ 0, 

b. negative definite if Q(x) < 0 for all x ^ 0, 

c. indefinite if Q(x) assumes both positive and negative values. 

Also, Q is said to be positive semidefinite if 2(x) > 0 for all x, and to be negative 
semidefinite if 2(x) < 0 for all x. The quadratic forms in parts (a) and (b) of Fig. 4 are 
both positive semidefinite, but the form in (a) is better described as positive definite. 
Theorem 5 characterizes some quadratic forms in terms of eigenvalues. 

THEOREM 5 Quadratic Forms and Eigenvalues 

Let A be an n x n symmetric matrix. Then a quadratic form x T Ax is: 

a. positive definite if and only if the eigenvalues of A are all positive, 

b. negative definite if and only if the eigenvalues of A are all negative, or 

c. indefinite if and only if A has both positive and negative eigenvalues. 
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z 



Positive definite 

z 



z 



PROOF By the Principal Axes Theorem, there exists an orthogonal change of variable 
x = Py such that 

S( x ) = x r Ax = y T Dy = Xiyf + A 2 j| + ... + Kyi ⑷ 

where Ai,..., are the eigenvalues of A. Since P is invertible, there is a one-to- 
one correspondence between all nonzero x and all nonzero y. Thus the values of Q (x) 
for x 7^ 0 coincide with the values of the expression on the right side of (4), which 
is obviously controlled by the signs of the eigenvalues X\,... ,X n , in the three ways 
described in the theorem. ■ 


EXAMPLE 6 Is Q(x)= 3xj + 2 x| + + 4x\X2 + 4 x 2^3 positive definite? 


SOLUTION Because of all the plus signs, this form “looks” positive definite. But the 


matrix of the form is 


A = 


3 2 

2 2 
0 2 


0 

2 

1 


and the eigenvalues of A turn out to be 5, 2, and —1. So Q is an indefinite quadratic 
form, not positive definite. ■ 


The classification of a quadratic form is often carried over to the matrix of the form. 
Thus a positive definite matrix ^4 is a symmetric matrix for which the quadratic form 
x^x is positive definite. Other terms, such as positive semidefinite matrix, are defined 
analogously. 


i— NUMERICAL NOTE - 

A fast way to determine whether a symmetric matrix A is positive definite is 
to attempt to factor A in the form A = R T R, where R is upper triangular with 
positive diagonal entries. (A slightly modified algorithm for an LU factorization 
is one approach.) Such a Cholesky factorization is possible if and only if A is 
positive definite. See Supplementary Exercise 7 at the end of Chapter 7. 


PRACTICE PROBLEM 


Describe a positive semidefinite matrix A in terms of its eigenvalues. 

WEB 


7.2 EXERCISES 


Compute the quadratic form x^4x, when A 
and 

「A , 

a. x = b. x = c. x : 

又2 


5 1/3' 


"•^1 " 


2 " 


" 1 /V 3 ' 

1/3 1 

a. x = 

x 2 

b. x = 

-1 

c. x = 

l/\/3 

" 1 " 


_X 3 _ 


5 


_1/V3_ 


2. Compute the quadratic form x T Ax ，for A 


4 3 

3 2 

0 1 


3. Find the matrix of the quadratic form. Assume x is in R 2 . 
a. lCbcf — 6 x 1 X 2 — 3x! b. 5x\ + ?>X\X 2 

4. Find the matrix of the quadratic form. Assume x is in R 2 . 
a. 20 xl + 15^1^2 — 10 x| b. X 1 X 2 


and 
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5. Find the matrix of the quadratic form. Assume x is in R 3 . 

a. %x\ + lx\ — 3x| — 6 xix 2 + 4xi ^ 3 — 2 x 2 x 3 

b. 4X1^2 + 6 X 1 X 3 — 8 X 2 X 3 

6 . Find the matrix of the quadratic form. Assume x is in R 3 . 

a. 5x\ — x 2 lx\ + 5 x\X 2 — ?>X\X 2 , 

b. x\ — AX\X 2 + 4^2%3 

7. Make a change of variable, x = Py, that transforms the 
quadratic form x\ + \ 0 x\X 2 + x\ into a quadratic form with 
no cross-product term. Give P and the new quadratic form. 

8 . Let A be the matrix of the quadratic form 

9x\ + lx\ + 1 \xl — 8 xix 2 + 8 xix 3 

It can be shown that the eigenvalues of A are 3, 9, and 15. 
Find an orthogonal matrix P such that the change of variable 
x = Py transforms x r Ax into a quadratic form with no cross- 
product term. Give P and the new quadratic form. 

Classify the quadratic forms in Exercises 9-18. Then make a 
change of variable, x = Py, that transforms the quadratic form 
into one with no cross-product term. Write the new quadratic 
form. Construct P using the methods of Section 7.1. 

9. ?>x\ — 4 xia ：2 + 6 x| 10. 9xf — SxiX 2 + 3x| 

11. 2 x\ + 10 ^ 1%2 + 2 x\ 12. —5x\ + Ax\X 2 — 2 x\ 

13. x\ — 6 xi %2 + 14. + 6 x 1 X 2 

15. [M] —2x\ — 6 x\ — 9xl — 9x\ + ^X\X 2 + 4 xiX 3 + 4 xiX 4 + 

6 x 3 X 4 

16. [M] 4xf + + 4x| + + 3x\X2 + 3 x 3 X 4 — 4x\x^ + 

4x 2 x 3 

17. [M] xj + + X 4 + 9x\X 2 — \2x\x^ + 12 x 2 X 3 + 9 ^ 3 X 4 

18. [M] llxj — x 2 — 12x\X2 — 12x\X3 — Hxix^ — 2 x 3 X 4 

19. What is the largest possible value of the quadratic 
form 5x\ + 8 x| if x = (xi, X 2 ) and x T x = 1, that is, if 
x\-\- x\ = 1? (Try some examples of x.) 

20. What is the largest value of the quadratic form 5xj — ?>x\ if 
x r x = 1 ? 

In Exercises 21 and 22, matrices are n x n and vectors are in R”. 
Mark each statement True or False. Justify each answer. 

21. a. The matrix of a quadratic form is a symmetric matrix. 

b. A quadratic form has no cross-product terms if and only 
if the matrix of the quadratic form is a diagonal matrix. 

c. The principal axes of a quadratic form x r Ax are eigenvec¬ 
tors of A. 

d. A positive definite quadratic form Q satisfies 2(x) > 0 
for all x in . 


e. If the eigenvalues of a symmetric matrix A are all posi¬ 
tive, then the quadratic form x^x is positive definite. 

f. A Cholesky factorization of a symmetric matrix A has 
the form A = R T R, for an upper triangular matrix R with 
positive diagonal entries. 

22. a. The expression ||x|| 2 is a quadratic form. 

b. If A is symmetric and P is an orthogonal matrix, then 
the change of variable x = Py transforms x r Ax into a 
quadratic form with no cross-product term. 

c. If ^4 is a 2 x 2 symmetric matrix, then the set of x such 
that x r Ax = c (for a constant c) corresponds to either a 
circle, an ellipse, or a hyperbola. 

d. An indefinite quadratic form is either positive semidefi- 
nite or negative semidefinite. 

e. If A is symmetric and the quadratic form x^x has only 
negative values for x _ 0, then the eigenvalues of A are 
all negative. 


Exercises 23 and 24 show how to classify a quadratic form 
a b 
b d 


Q(x) = x r Ax, when ^4 = 
ing the eigenvalues of A. 


and det A _ 0, without find- 


23. If X\ and A 2 are the eigenvalues of A, then the characteristic 
polynomial of A can be written in two ways: do.t(A — XI) 
and ( 久一久 i)( 义一久 2 ). Use this fact to show that 久 1 + 久 2 = 
a -\- d (the diagonal entries of A) and X 1 X 2 = det A. 

24. Verify the following statements. 

a. Q is positive definite if det ^4 > 0 and a > 0. 

b. Q is negative definite if det ^4 > 0 and a < 0. 

c. Q is indefinite if det ^4 < 0. 

25. Show that if B is m x n, then B T B is positive semidefinite; 
and if B is n x n and invertible, then B T B is positive definite. 

26. Show that if an « x matrix A is positive definite, then there 
exists a positive definite matrix B such that ^4 = B T B. [Hint: 
Write A = PDP T , with P T = P~ l . Produce a diagonal 
matrix C such that D = C T C, and let B = PCP T . Show 
that B works.] 

27. Let A and B be symmetric n x n matrices whose eigenvalues 
are all positive. Show that the eigenvalues of A B are all 
positive. [Hint: Consider quadratic forms.] 

28. Let Ab& an n x n invertible symmetric matrix. Show that 
if the quadratic form x r Ax is positive definite, then so is the 
quadratic form x T A~ l x. [Hint: Consider eigenvalues.] 
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SOLUTION TO PRACTICE PROBLEM 

Make an orthogonal change of variable x = Py, and write 

xUx = y T Dy = + X 2 yl + ... + Kyi 

as in equation (4). If an eigenvalue — say, A/ — were negative, then x^x would be 
negative for the x corresponding to y = e, (the /th column of I n ). So the eigenvalues 
of a positive semidefinite quadratic form must all be nonnegative. Conversely, if the 
eigenvalues are nonnegative, the expansion above shows that x r Ax must be positive 
semidefinite. 


7.3 CONSTRAINED OPTIMIZATION 


Engineers, economists, scientists, and mathematicians often need to find the maximum 
or minimum value of a quadratic form Q (x) for x in some specified set. Typically, the 
problem can be arranged so that x varies over the set of unit vectors. This constrained 
optimization problem has an interesting and elegant solution. Example 6 below and the 
discussion in Section 7.5 will illustrate how such problems arise in practice. 

The requirement that a vector x in be a unit vector can be stated in several 
equivalent ways: 

l|x|| = 1 , l|x|| 2 = 1 , X T X = 1 

and 

xf + # + … + < = 1 (1) 

The expanded version (1) of x T x = 1 is commonly used in applications. 

When a quadratic form Q has no cross-product terms, it is easy to find the maximum 
and minimum of Q(x) for x T x = 1 . 


EX A MPLE 1 Find the maximum and minimum values of Q(x) = 9xf + \x\ + 3xj 
subject to the constraint x T x = 1. 


SOLUTION Since x\ and x\ are nonnegative, note that 

\x\ < 9x\ and < 9xj 


and hence 


Q(x) = 9x\ + \x\ + 3xj 
< 9x\ + 9x\ + 9xl 
= 9(x\ + + x]) 

= 9 


whenever x\x\x\ = 1. So the maximum value of Q(x) cannot exceed 9 when 
x is a unit vector. Furthermore, Q(x) = 9 when x = (1,0,0). Thus 9 is the maximum 
value of Q (x) for x T x = 1. 

To find the minimum value of Q(x), observe that 

9x\ > \x\ > 3xl 

and hence 

Q(x) > 3xf + + = 3(xj + + xj) = 3 

whenever x\x\x\ = 1. Also, Q(x) = 3 when x\ = 0, X 2 = 0, and X3 = 1. So 3 
is the minimum value of Q(x) when x T x =1. ■ 
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THEOREM 6 


It is easy to see in Example 1 that the matrix of the quadratic form Q has eigen¬ 
values 9, 4, and 3 and that the greatest and least eigenvalues equal, respectively, the 
(constrained) maximum and minimum of Q(x). The same holds true for any quadratic 
form, as we shall see. 


EXAMPLE 2 Let^ 


3 0 

0 7 ^ 

plays the graph of Q. Figure 2 shows only the portion of the graph inside a cylinder; 


,and let Q(x) = x T Ax for x in M 1 2 . Figure 1 dis- 


the intersection of the cylinder with the surface is the set of points (x\,X 2 , z) such that 
Z = Q(xi,X 2 ) and x\-\- x\ = 1. The “heights” of these points are the constrained 
values of Q(x). Geometrically, the constrained optimization problem is to locate the 
highest and lowest points on the intersection curve. 


The two highest points on the curve are 7 units above the xiX 2 -plane, occurring 
where xi = 0 and X 2 = 士 1. These points correspond to the eigenvalue 1 of A and 
the eigenvectors x = (0,1) and —x = (0,-1). Similarly, the two lowest points on the 
curve are 3 units above the x 1 X 2 -plane. They correspond to the eigenvalue 3 and the 
eigenvectors (1,0) and (—1,0). ■ 


z z 



?>x\ + lx\ and the cylinder = 1. 


Every point on the intersection curve in Fig. 2 has a z-coordinate between 3 and 7, 
and for any number t between 3 and 7, there is a unit vector x such that Q(x) = t. In 
other words, the set of all possible values of x^x, for ||x|| = 1, is the closed interval 
3<t <7. 

It can be shown that for any symmetric matrix A, the set of all possible values of 
x t Ax, for ||x|| = 1, is a closed interval on the real axis. (See Exercise 13.) Denote the 
left and right endpoints of this interval by m and M, respectively. That is, let 

m = minlx^x : ||x|| = 1}， M = max {x^x : ||x|| = 1} (2) 

Exercise 12 asks you to prove that if A is an eigenvalue of A, then m < X < M. The 
next theorem says that m and M are themselves eigenvalues of ^4, just as in Example 2. 1 


Let ^4 be a symmetric matrix, and define m and M as in (2). Then M is the greatest 
eigenvalue X\ of A and m is the least eigenvalue of A. The value of x T Ax is M 
when x is a unit eigenvector ui corresponding to M. The value of x r Ax is m when 
x is a unit eigenvector corresponding to m. 


1 The use of minimum and maximum in (2), and least and greatest in the theorem, refers to the natural 

ordering of the real numbers, not to magnitudes. 
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PROOF Orthogonally diagonalize A as PDP~ l . We know that 

x^x = y T Dy when x = Py 


Also, 


l|x|| = II 尸 y|| = ||y|| for ally 


( 3 ) 


because P T P = I and ||Py|| 2 = (Py) T (Py) = y T P T Py = y T y = ||y|| 2 . In particular, 
||y || = 1 if and only if ||x|| = 1. Thus x T Ax and y T Dy assume the same set of values as 
x and y range over the set of all unit vectors. 

To simplify notation, suppose that ^4 is a 3 x 3 matrix with eigenvalues a > b > c. 
Arrange the (eigenvector) columns of P so that 尸 =[ui U 2 U 3 ] and 

a 0 0 

D = 0 b 0 

_0 0 c _ 

Given any unit vector y in R 3 with coordinates y\, y 2 , J 3 , observe that 

ay\ = ayl 
byl < ayj 

cyl < ayj 


and obtain these inequalities: 

y T Dy = ay\ + by\ + cy\ 

< ay\ + ay\ + ayj 
= a(y\ +y\ + yj) 

= a||y || 2 = a 

Thus M < a, by definition of M. However, y T Dy = a when y = ei = (1,0,0), so in 
fact M = a. By (3), the x that corresponds to y = ei is the eigenvector ui of A, because 

1 一 

0 = Ui 

0 

Thus M = a = De\ = uj^4ui, which proves the statement about M. A similar 
argument shows that m is the least eigenvalue, c, and this value of x^x is attained 
when x = Pe^ = U3. ■ 


x = 尸 ei = [im u 2 u 3 ] 


EXAMPLE 3 



1 

1 . Find the maximum value of the quadratic 

4 


form x t Ax subject to the constraint x T x = 1, and find a unit vector at which this maxi¬ 


mum value is attained. 


SOLUTION By Theorem 6 , the desired maximum value is the greatest eigenvalue of 
A. The characteristic equation turns out to be 

0 = - 又 3 + 10A 2 - 27A + 18 = -(A- 6)(A- 3) (A - 1) 

The greatest eigenvalue is 6 . 

The constrained maximum of x^x is attained when x is a unit eigenvector for 



_r 


1/V3 

A = 6 . Solve (A — 6 /)x = 0 and find an eigenvector 

1 

1 

.Setui = 

1/V3 

1/V3 
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In Theorem 7 and in later applications, the values of x r Ax are computed with addi¬ 
tional constraints on the unit vector x. 


THEOREM 7 


Let A, X\, and Ui be as in Theorem 6. Then the maximum value of x^Ax subject 
to the constraints 

x r x = 1， x r ui = 0 

is the second greatest eigenvalue, 久 2 , and this maximum is attained when x is an 
eigenvector U 2 corresponding to 久 2 . 


Theorem 7 can be proved by an argument similar to the one above in which the 
theorem is reduced to the case where the matrix of the quadratic form is diagonal. The 
next example gives an idea of the proof for the case of a diagonal matrix. 

EXAMPLE 4 Find the maximum value of 9x\ + \x\ + subject to the con¬ 
straints x T x = 1 and x r ui = 0, where Ui = (1,0,0). Note that U] is a unit eigenvector 
corresponding to the greatest eigenvalue A = 9 of the matrix of the quadratic form. 

SOLUTION If the coordinates of x are X\, X2, X3, then the constraint x r ui = 0 means 
simply that X\ = 0. For such a unit vector, x\-\- x\ = 1, and 

9x\ + \x\ + = \x\ + 3x| 

< \x\ + \x\ 

= 4(x| + xj) 

= 4 


Thus the constrained maximum of the quadratic form does not exceed 4. And this value 
is attained for x = (0,1,0), which is an eigenvector for the second greatest eigenvalue 
of the matrix of the quadratic form. ■ 


EXAMPLE 5 Let A be the matrix in Example 3 and let ui be a unit eigenvector 
corresponding to the greatest eigenvalue of A. Find the maximum value of x r Ax subject 
to the conditions 

x r x = 1, x r Ul = 0 (4) 

SOLUTION From Example 3, the second greatest eigenvalue of ^4 is A = 3. Solve 
(A — 3/)x = 0 to find an eigenvector, and normalize it to obtain 


«2 = 


1 /V 6 

1 /V 6 

- 2 /V 6 


The vector U 2 is automatically orthogonal to Ui because the vectors correspond to dif¬ 
ferent eigenvalues. Thus the maximum of x T Ax subject to the constraints in (4) is 3, 
attained when x = U 2 . ■ 

The next theorem generalizes Theorem 7 and, together with Theorem 6, gives a 
useful characterization of all the eigenvalues of A. The proof is omitted. 
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THEOREM 8 



Let d be a symmetric n x n matrix with an orthogonal diagonalization 
A = PDP~ l , where the entries on the diagonal of D are arranged so that 
X\ > A 2 > ••- > and where the columns of P are corresponding unit eigen¬ 
vectors ui ， • • • ， u„. Then for k = 2, ... ,n, the maximum value of x T Ax subject 
to the constraints 

x^x = 1, x^U] = 0, … ， x^k-i = 0 

is the eigenvalue and this maximum is attained at x = u 炎 . 


Theorem 8 will be helpful in Sections 7.4 and 7.5. The following application 
requires only Theorem 6. 

EXAMPLE 6 During the next year, a county government is planning to repair x 
hundred miles of public roads and bridges and to improve y hundred acres of parks 
and recreation areas. The county must decide how to allocate its resources (funds, 
equipment, labor, etc.) between these two projects. If it is more cost-effective to work 
simultaneously on both projects rather than on only one, then x and y might satisfy a 
constraint such as 

4x 2 + 9y 2 < 36 

See Fig. 3. Each point (x, y) in the shaded feasible set represents a possible public 
works schedule for the year. The points on the constraint curve, 4x 2 + 9y 2 = 36, use 
the maximum amounts of resources available. 




Parks and 
recreation 

2 - 

4jc 2 + 9y 2 = 36 

Feasible 
set \ 


3 


Road and bridge repair 

FIGURE 3 

Public works schedules. 


In choosing its public works schedule, the county wants to consider the opinions of 
the county residents. To measure the value, or utility ， that the residents would assign to 
the various work schedules (x, y), economists sometimes use a function such as 

q(x,y) = xy 


The set of points (x, j) at which q(x, y) is a constant is called an indifference curve. 
Three such curves are shown in Fig. 4. Points along an indifference curve correspond 
to alternatives that county residents as a group would find equally valuable. 2 Find the 
public works schedule that maximizes the utility function q. 


SOLUTION The constraint equation 4x 2 + 9y 2 = 36 does not describe a set of unit 
vectors, but a change of variable can fix that problem. Rewrite the constraint in the 
form 



2 Indifference curves are discussed in Michael D. Intriligator, Ronald G. Bodkin, and Cheng Hsiao, 
Econometric Models, Techniques, and Applications (Upper Saddle River, NJ: Prentice-Hall, 1996). 
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y 


Parks and 

+ 9y 2 = 36 

recreation 

(indifference curves) 

1.4 - 

- W = 4 

3^) = 3 


2.1 qix, y) = 2 


Road and bridge repair 


FIGURE 4 The optimum public works schedule 
is (2.1,1.4). 


and define 


= v 


2 ’ 


that is, x = 3x\ and y = 2x2 


Then the constraint equation becomes 


+ x 2 


and the utility function becomes q(3x\,2x2) = (3xi)( 2 x 2 ) = 6 x 1 ^ 2 . Let x = . 

_ x 2 _ 

Then the problem is to maximize Q{s) = 6 x 1 X 2 subject to x r x = 1. Note that Q(x)= 
x t Ax, where 


The eigenvalues of A are 士 3, with eigenvectors 


1/V2 

1/V2 


for A = 3 and 


-1/V2 

1/V2 


for 


A = —3. Thus the maximum value of Q(x) = q(x\, X 2 ) is 3, attained when x\ = l / \fl 
and X 2 = 1 / y/2. 


In terms of the original variables, the optimum public works schedule is x = 3x\ = 
3/\/2 义 2.1 hundred miles of roads and bridges and y = 2x2 = a/ 2 ^ 1.4 hundred 
acres of parks and recreational areas. The optimum public works schedule is the point 
where the constraint curve and the indifference curve q{x, j) = 3 just meet. Points 
(x, y) with a higher utility lie on indifference curves that do not touch the constraint 
curve. See Fig. 4. ■ 


PRACTICE PROBLEMS 

1. Let Q(x) = 3xj + + 2x\X2. Find a change of variable that transforms Q into 

a quadratic form with no cross-product term, and give the new quadratic form. 

2. With Q as in Problem 1, find the maximum value of Q(x) subject to the constraint 
x T x = 1 , and find a unit vector at which the maximum is attained. 


7.3 EXERCISES 

In Exercises 1 and 2, find the change of variable x = Py that 
transforms the quadratic form x r Ax into y T Dy as shown. 

1. 5x\ + + 7x| + Ax\X 2 — 4x 2 x 3 = 9y\ + + 

2 . + 2x\ + 2x\ + 2 x\X 2 + 2 %iA ：3 + 4 义 2 又 3 = + 2y\ 

[Hint: x and y must have the same number of coordinates, 
so the quadratic form shown here must have a coefficient of 
zero for y|.] 


In Exercises 3-6, find (a) the maximum value of Q (x) subject to 
the constraint x r x = 1 , (b) a unit vector u where this maximum is 
attained, and (c) the maximum of Q (x) subject to the constraints 
x r x = 1 and x r u = 0 . 


3. g(x) = 5x\ + 6 x| + Ixl + Ax\X 2 — 4x 2 x 3 
(See Exercise 1.) 
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4. Q(x) = + 2x\ + 2x\ + 2 x ^2 + 2x\x^ + 4 x 2 A (See 

Exercise 2.) 

5. Q (x) = 5x\ + 5x| — 4xix 2 

6. 2(x) = 7xj + 3x| + 3xia ：2 

7. Let Q(x) = —2x\ — x! + Ax\X 2 + 4 ^ 2 X 3 . Find a unit vector 
x in R 3 at which Q(x) is maximized, subject to x r x = 1 . 
[Hint: The eigenvalues of the matrix of the quadratic form 
Q are 2,-1, and —4.] 

8 . Let Q{x) = lx\ + x| + lx\ — Sx\x 2 — 4xix 3 — Sx 2 x 3 . 
Find a unit vector x in R 3 at which Qix) is maximized, 
subject to x T x = 1. [Hint: The eigenvalues of the matrix of 
the quadratic form Q are 9 and —3.] 

9. Find the maximum value of Q(x) = lx\ + 3x| — 2x\x^^ 
subject to the constraint x\ x\ = 1. (Do not go on to find 
a vector where the maximum is attained.) 

10. Find the maximum value of Q{s) = —3x^ + 5x| — 2x\X2-> 

subject to the constraint = 1. (Do not go on to find 

a vector where the maximum is attained.) 

11. Suppose x is a unit eigenvector of a matrix A corresponding 
to an eigenvalue 3. What is the value of x r Ax? 


12. Let A be any eigenvalue of a symmetric matrix A. Justify 
the statement made in this section that m < X < M, where 
m and M are defined as in (2). [Hint: Find an x such that 
A = x t Ax.] 

13. Let A an n x n symmetric matrix, let M and m denote 
the maximum and minimum values of the quadratic form 
x^x, and denote corresponding unit eigenvectors by Ui and 
u„. The following calculations show that given any number t 
between M and m, there is a unit vector x such that f = x^x. 
Verify that ^ = (1 — a)m + aM for some number a between 
0 and 1. Then let x = Vl — oiu n + s/aui, and show that 
x r x = 1 and x^x = t. 

[M] In Exercises 14-17, follow the instructions given for Exer¬ 
cises 3-6. 

14. X 1 X 2 + 3XiX3 + 30^1^4 + 30^2^3 + 3 ^ 2^4 + X 3 X 4 

15. 3XiX2 + 5XiX3 + lX\X^ + 7^2^3 + 5X2^4 + 3^3^4 

16. \x\ — 6 X 1^2 — 10 X 1^3 — 10 X 1^4 — 6 X 2 X 3 — 6 ^ 2 X 4 — 2 ^ 3^4 

17. —6x\ — 10x| — 13^3 — 13^4 — 4 xi ^2 — 4 xi ^3 — Ax\x^ + 

6X3X4 


SOLUTIONS TO PRACTICE PROBLEMS 



The maximum value of Q (x) 
subject to x r x = 1 is 4. 


3 

The matrix of the quadratic form is ^4 = ^ 

4 and 2, and corresponding unit eigenvectors, 

desired change of variable is x = Py, where P 


.It is easy to find the eigenvalues, 
.So the 

.(A common 


"1/V2" 

o nH 

"-1/V2" 

1/V2 

tlllU 

1/V2 


1/V2 -1/V2 
1/V2 1/V2 

error here is to forget to normalize the eigenvectors.) The new quadratic form is 
y T Dy = + 2 y \. 

2. The maximum of Q{s) for x a unit vector is 4, and the maximum is attained at 

.This vector 


" 1 /V 2 " 


T 

L 1 /V 2 J 

.[A common incorrect answer is 

0 _ 


the unit eigenvector 
maximizes the quadratic form y T Dy instead of Q(x).] 


7.4 THE SINGULAR VALUE DECOMPOSITION 


The diagonalization theorems in Sections 5.3 and 7.1 play a part in many interesting ap¬ 
plications. Unfortunately, as we know, not all matrices can be factored sls A = PDP~ l 
with D diagonal. However, a factorization A = QDP~ l is possible for any m 乂 n 
matrix A ! A special factorization of this type, called the singular value decomposition, 
is one of the most useful matrix factorizations in applied linear algebra. 

The singular value decomposition is based on the following property of the ordinary 
diagonalization that can be imitated for rectangular matrices: The absolute values of the 
eigenvalues of a symmetric matrix A measure the amounts that A stretches or shrinks 
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certain vectors (the eigenvectors). If Ax = Ax and ||x|| = 1， then 

| 网 | = || 久 x|| = I 又 I M = I 久 I ⑴ 

If 入 i is the eigenvalue with the greatest magnitude, then a corresponding unit eigenvec¬ 
tor Vi identifies a direction in which the stretching effect of A is greatest. That is, the 
length of Ax is maximized when x = Vi, and ||^4vi|| = |Ai |, by (1). This description of 
Vi and |Ai I has an analogue for rectangular matrices that will lead to the singular value 
decomposition. 


EXAMPLE 1 If^ 


11 

7 


14 

-2 


,then the linear transformation x i-^ Ax maps 


the unit sphere {x : ||x|| = 1} in M 3 onto an ellipse in R 2 , shown in Fig. 1. Find a unit 
vector x at which the length ||^4x|| is maximized, and compute this maximum length. 


义 3 






FIGURE 1 A transformation from R 3 to E 2 . 


SOLUTION The quantity ||ylx || 2 is maximized at the same x that maximizes ||^4x||, and 
||ylx || 2 is easier to study. Observe that 

\\Ax\\ 2 = {Ax) T {Ax) = x t A t Ax = x t (A t A)x 

Also, A T A is a symmetric matrix, since (A T A) T = A T A TT = A T A. So the problem now 
is to maximize the quadratic form x T (A T A)x subject to the constraint ||x|| = 1. By 
Theorem 6 in Section 7.3, the maximum value is the greatest eigenvalue X\ of A T A. 
Also, the maximum value is attained at a unit eigenvector of A T A corresponding to X\. 


"4 11 14" 

1/3 


"18" 

_8 7 -2_ 

2/3 

2/3 


_ 6 - 


A\\ 


For ||x|| = 1, the maximum value of \\Ax\\ is ||^4vi || = V360 = 6VT0. 


■ 


Example 1 suggests that the effect of A on the unit sphere in R 3 is related to the 
quadratic form x T (A T A)x. In fact, the entire geometric behavior of the transformation 
x ^4x is captured by this quadratic form, as we shall see. 


For the matrix A in this example, 



4 8 l 

a t a = 

11 7 

14 -2 


The eigenvalues of A T A are X\ = 360, 久 2 = 90, and A 3 = 0. Corresponding unit eigen¬ 
vectors are, respectively, 



"1/3" 


"-2/3" 


2/3" 

Vl = 

2/3 

,V 2 = 

-1/3 

,v 3 = 

-2/3 


2/3 


2/3 


1/3 


The maximum value of ||^4x || 2 is 360, attained when x is the unit vector Vi. The 
A\\ is a point on the ellipse in Fig. 1 farthest from the origin, namely, 
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The Singular Values of an m x n Matrix 

Let ^4 be an m x n matrix. Then A T A is symmetric and can be orthogonally diagonalized. 
Let {vi, ..., v 72 } be an orthonormal basis for W 1 consisting of eigenvectors of A T A, and 
let Ai,..., be the associated eigenvalues of A T A. Then, for 1 < z <n, 

\\A\if = (A\i) T A\i = \fA T A\i 

=\J(X(\i) Since v, is an eigenvector of A T A 
= A/ Since y, is a unit vector (2) 

So the eigenvalues of A T A are all nonnegative. By renumbering, if necessary, we may 
assume that the eigenvalues are arranged so that 

久 l > ^2 > ••- > >0 

The singular values of A are the square roots of the eigenvalues of A T A, denoted by 
... ,G n , and they are arranged in decreasing order. That is, ( 7 / = for 1 < i < n. 
By equation (2), the singular values of A are the lengths of the vectors A \\,..., A\ n . 


EXAMPLE 2 Let A be the matrix in Example 1. Since the eigenvalues of A T A are 
360, 90, and 0, the singular values of A are 

G\ = V360 = 6\/l0, 0*2 = V90 = 3\/To, 0*3 = 0 



FIGURE 2 


From Example 1, the first singular value of A is the maximum of ||i4x|| over all unit 
vectors, and the maximum is attained at the unit eigenvector Vi. Theorem 7 in Section 
7.3 shows that the second singular value of A is the maximum of ||i4x|| over all unit 
vectors that are orthogonal to \\, and this maximum is attained at the second unit 
eigenvector, V 2 (Exercise 22). For the \2 in Example 1, 


Ay 2 = 

"4 11 14" 

'-2/3" 


3" 

_ 8 7 -2_ 

-1/3 

2/3 


-9_ 


This point is on the minor axis of the ellipse in Fig. 1, just as A\\ is on the major axis. 
(See Fig. 2.) The first two singular values of A are the lengths of the major and minor 
semiaxes of the ellipse. ■ 


The fact that A\\ and A \2 are orthogonal in Fig. 2 is no accident, as the next theorem 
shows. 


THEOREM 9 Suppose {vi,.. .,v„ } is an orthonormal basis of consisting of eigenvectors of 
A t A, arranged so that the corresponding eigenvalues of A T A satisfy A 1 > • • • > X n , 
and suppose A has r nonzero singular values. Then {^4vi,..., A\ r } is an 
orthogonal basis for Col A, and rank A = r. 


PROOF Because v, and A 7 v y are orthogonal for i ^ j, 

04v;) r 04v)) = \J A T A\j = (Xj\j) = 0 

Thus {^4vi, …， Ay n } is an orthogonal set. Furthermore, since the lengths of the vec¬ 
tors A \\,..., A\ n are the singular values of A, and since there are r nonzero singular 
values, A\i 7 ^ 0 if and only if 1 < z < r. So A\\, ..., A\ r are linearly independent 










7.4 The Singular Value Decomposition 417 


vectors, and they are in Col ^4. Finally, for any y in Col ^4— say, y = ^4x— we can write 
X = CiVi H - h C n \ n , and 

y = Ax = c x A\i H - h c r A\ r + c r+ iA\ r+x H - h c n A\ n 

= c\A\\ H - 1- c r A\ r + 0 H - h 0 

Thus y is in Span {A\\ ,..., A\ r }, which shows that {i4vi, ..., A\ r } is an (orthogonal) 
basis for Col A Hence rank ^4 = dim Col A = r. ■ 

i— NUMERICAL NOTE - 

In some cases, the rank of A may be very sensitive to small changes in the entries 
of A. The obvious method of counting the number of pivot columns in A does 
not work well if A is row reduced by a computer. Roundoff error often creates 
an echelon form with full rank. 

In practice, the most reliable way to estimate the rank of a large matrix A 
is to count the number of nonzero singular values. In this case, extremely small 
nonzero singular values are assumed to be zero for all practical purposes, and the 
effective rank of the matrix is the number obtained by counting the remaining 
nonzero singular values. 1 


The Singular Value Decomposition 

The decomposition of A involves mm x n “diagonal” matrix S of the form 

S=R ° Q ] (3) 

U 1) 」 —m — r rows 
' —— n — r columns 

where Z) is an r x r diagonal matrix for some r not exceeding the smaller of m and n. 
(If r equals m or « or both, some or all of the zero matrices do not appear.) 


THEOREM 10 The Singular Value Decomposition 

Let ^4 be an m x /I matrix with rank r. Then there exists m m x n matrix S as 
in (3) for which the diagonal entries in D are the first r singular values of A, 
(j\ > 0*2 > • •. > >0, and there exist an m x m orthogonal matrix U and an 

n x n orthogonal matrix V such that 

A = UT,V t 


Any factorization A = UTiV T , with U and V orthogonal, S as in (3)，and positive 
diagonal entries in D, is called a singular value decomposition (or SVD) of A. The 
matrices U and V are not uniquely determined by A, but the diagonal entries of S are 
necessarily the singular values of A. See Exercise 19. The columns of U in such a 
decomposition are called left singular vectors of A, and the columns of V are called 
right singular vectors of A. 


1 In general, rank estimation is not a simple problem. For a discussion of the subtle issues involved, see 
Philip E. Gill, Walter Murray, and Margaret H. Wright, Numerical Linear Algebra and Optimization, vol. 1 
(Redwood City, CA: Addison-Wesley, 1991), Sec. 5.8. 
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cr "I Computing an SVD 

rrJ 7-10 


PROOF Let Xt and v, be as in Theorem 9, so that {^4vi ， ... ， A\ r } is an orthogonal basis 
for Col A. Normalize each Ay t to obtain an orthonormal basis {ui ， … ， u r }，where 

Ui = II / II A\i = —A\i 

|| 圳 I CTi 

and 

= GiUi (1 < / < r) ( 4 ) 

Now extend {ui,..., u r } to an orthonormal basis {ui,..., u m } of R m , and let 
U = [ui u 2 ••- u m ] and V = [\\ \2 … \ n ] 

By construction, U and V are orthogonal matrices. Also, from (4), 

AV = [A\\ … A\ r 0 ••- 0] = [<JiUi ... a r u r 0 ••- 0 ] 

Let D be the diagonal matrix with diagonal entries o\,... ,o r , and let S be as in 
(3) above. Then 


f/S = [ui u 2 ••- u m ] 


汀 1 



0 



a 2 



0 

0 



Or 


0 

0 


= [(J\Ui ••- a r u r 0 ••- 0 ] 

=AV 

Since V is an orthogonal matrix, UTiV T = AVV T = A. ■ 


The next two examples focus attention on the internal structure of a singular value 
decomposition. An efficient and numerically stable algorithm for this decomposition 
would use a different approach. See the Numerical Note at the end of the section. 


EXAMPLE 3 Use the results of Examples 1 and 2 to construct a singular value 
decomposition of ^ = 4 U ■ 


SOLUTION A construction can be divided into three steps. 

Step 1 • Find an orthogonal diagonalization of A T A. That is, find the eigenvalues of 
A t A and a corresponding orthonormal set of eigenvectors. If A had only two columns, 
the calculations could be done by hand. Larger matrices usually require a matrix pro¬ 
gram . 2 However, for the matrix A here, the eigendata for A T A are provided in Example 1. 

Step 2. Set up V and E. Arrange the eigenvalues of A T A in decreasing order. In 
Example 1， the eigenvalues are already listed in decreasing order: 360, 90, and 0. The 
corresponding unit eigenvectors, Vi, V 2 , and V 3 , are the right singular vectors of A. Using 
Example 1 ， construct 

'1/3 -2/3 2/3" 

K = [vi v 2 v 3 ]= 2/3 -1/3 -2/3 

2/3 2/3 1/3 


2 See the Study Guide for software and graphing calculator commands. MATLAB, for instance, can produce 
both the eigenvalues and the eigenvectors with one command, eig. 
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The square roots of the eigenvalues are the singular values: 

6\/To, 02 = 3 VTo, 0*3 = 0 


o\ 


The nonzero singular values are the diagonal entries of D • The matrix E is the same 
size as A, with D in its upper left corner and with O’s elsewhere. 


D 


6n/I0 

0 


0 

3n/T0 


S = [D 0]= 


6V^0 

0 


0 

3x/l0 


Step 3. Construct U. When A has rank r, the first r columns of U are the normalized 
vectors obtained from A \\,..., A\ r . In this example, A has two nonzero singular 
values, so rank^4 = 2. Recall from equation (2) and the paragraph before Example 
2 that \\A\i II = (j\ and ||^4v 2 || = 0 * 2 . Thus 


-A\\ 


u 2 


<y\ 


—— Ay 2 
02 


1 

" 18 " 


3 /Vl 0 


6^10 

6 _ 


l/vTo 


1 

3 " 


1 /^ 

3 VI 0 

- 9 _ 


- 3 / ⑽ 


Note that {ui, U 2 } is already a basis for R 2 . Thus no additional vectors are needed for 
U, and U = [ui 112 ]• The singular value decomposition of A is 


A 


3/v^IO 1 /v^IO 
1/v^ -3/VIO 


6VT0 0 0 

0 3N/T0 0 


U 


EXAMPLE 4 Find a singular value decomposition of A 


1/3 

2/3 

2/3 

2/3 

- 1/3 

2/3 

2/3 

- 2/3 

1/3 


i 

V 

r 


1 

-2 

- 1 " 

2 



2 

-2 




■ 


SOLUTION First, compute A T A = 
with corresponding unit eigenvectors 


.The eigenvalues of A T A are 18 and 0, 


Vl 

These unit vectors form the columns of V : 

V = [\\ \ 2 ] 


1 /V 2 " 


" 1 /V 2 " 

- 1 /V 2 

,V 2 = 

1 /V 2 


1 /V 2 

- 1 /V 2 


1/V2 

1 /V 2 


The singular values are = Vl8 = 3\/2 and 02 = 0. Since there is only one nonzero 
singular value, the “matrix” Z) may be written as a single number. That is, D = 3^2. 
The matrix S is the same size as A, with D in its upper left comer: 



~ D 

0 " 


" 3 V 2 

0 " 

Ti = 

0 

0 

= 

0 

0 


0 

0 


0 

0 


To construct U, first construct A\\ and Ay 2 '. 



2 /V 2 


"0" 

A\\ = 

- 4 /V 2 

， A\ 2 = 

0 


4 /V 2 


0 
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x i 


(Check that Wi and W 2 are each orthogonal to Ui.) Apply the Gram-Schmidt process 
(with normalizations) to {wi, W 2 }，and obtain 



' 2 /V 5 ~ 


-2/V45 

U2 = 

1 /V 5 

, U 3 = 

4/V45 


_ 0 _ 


5/V45 


Finally, set t/ = [ui 112 U 3 ], take S and V T from above, and write 



As a check on the calculations, verify that \\A\\ \\ = a\ = 3V2. Of course, A \2 
because || AV 2 II = 0*2 = 0. The only column found for U so far is 

1 / 3 ” 


1 


3V2 


A\\ 


-2/3 

2/3 


The other columns of U are found by extending the set {ui} to an orthonormal basis for 
R 3 . In this case, we need two orthogonal unit vectors U 2 and 113 that are orthogonal to 
Ui. (See Fig. 3.) Each vector must satisfy ufx = 0, which is equivalent to the equation 
X\ — 2x2 + 2^3 = 0. A basis for the solution set of this equation is 



"2" 


"-2" 

Wi = 

1 

0 

,w 2 = 

0 

1 



" 1 - 1 " 


1/3 

2/V5 

-2/V45 

"3V2 

ol 

A = 

-2 2 

= 

- 2/3 

1 /V 5 

4/V45 

0 

0 


2-2 


2/3 

0 

5/V45 

0 

o 」 


1/V2 -1/V2 
1/V2 1/V2 


■ 


Applications of the Singular Value Decomposition 

The SVD is often used to estimate the rank of a matrix, as noted above. Several other nu¬ 
merical applications are described briefly below, and an application to image processing 
is presented in Section 7.5. 

EXAMPLE 5 (The Condition Number) Most numerical calculations involving an 
equation Ax = b are as reliable as possible when the SVD of A is used. The two 
orthogonal matrices U and V do not affect lengths of vectors or angles between vectors 
(Theorem 7 in Section 6.2). Any possible instabilities in numerical calculations are 
identified in S. If the singular values of A are extremely large or small, roundoff errors 
are almost inevitable, but an error analysis is aided by knowing the entries in S and V. 

If A is an invertible n x n matrix, then the ratio 0 \ / a n of the largest and smallest 
singular values gives the condition number of A. Exercises 41^4-3 in Section 2.3 
showed how the condition number affects the sensitivity of a solution of Ax = b to 
changes (or errors) in the entries of A. (Actually, a “condition number” of A can be 
computed in several ways, but the definition given here is widely used for studying 
Ax = b.) ■ 

EXAMPLE 6 (Bases for Fundamental Subspaces) Given an SVD for m mxn 
matrix ^4, letui,..., u m be the left singular vectors, Vi,..., y„ the right singular vectors, 
and G\,... ,o n the singular values, and let r be the rank of A. By Theorem 9, 

{ui,...,u r } (5) 

is an orthonormal basis for Col A. 
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The fundamental subspaces in 
Example 4. 


THEOREM 


Recall from Theorem 3 in Section 6 .1 that (Col ^4)-*- = Nul A T . Hence 

{u r+ i,...,u m } (6) 

is an orthonormal basis for Nul A T . 

Since \\A\i || = cr/ for 1 < z < n, and a z is 0 if and only if i > r, the vectors 
y r +i,..., span a subspace of Nul ^4 of dimension n — r. By the Rank Theorem, 
dim Nul A = n — rank 火 . It follows that 

{y r+ i,...,y„} ⑺ 

is an orthonormal basis for Nul A, by the Basis Theorem (in Section 4.5). 

From (5) and ( 6 ), the orthogonal complement of Nul A T is Col A Interchanging A 
and A T , note that (Nul 乂 ) 丄 =Col^4 r = Row A. Hence, from (7), 

{vi,...,v r } ⑻ 

is an orthonormal basis for Row A. 

Figure 4 summarizes (5)-(8), but shows the orthogonal basis {aiUi, … ， o>u,-} for 
Col A instead of the normalized basis, to remind you that A\i = 0711 / for 1 < i < r. 
Explicit orthonormal bases for the four fundamental subspaces determined by A are 

useful in some calculations, particularly in constrained optimization problems. ■ 


Multiplication 



FIGURE 4 The four fundamental subspaces and the 
action of A. 


The four fundamental subspaces and the concept of singular values provide the final 
statements of the Invertible Matrix Theorem. (Recall that statements about A T have 
been omitted from the theorem, to avoid nearly doubling the number of statements.) 
The other statements were given in Sections 2.3, 2.9, 3.2, 4.6, and 5.2. 


The Invertible Matrix Theorem (concluded) 

Let ^4 be an 71 x « matrix. Then the following statements are each equivalent to 
the statement that A is an invertible matrix. 

u. (Col A) 1 - = {0}. 

v. (Nul4)i=R". 

w. Row A = R n . 

x. A has n nonzero singular values. 
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EXAMPLE 7 (Reduced SVD and the Pseudoinverse of /\) When E contains rows or 
columns of zeros, a more compact decomposition of A is possible. Using the notation 
established above, let r = rank A, and partition U and V into submatrices whose first 
blocks contain r columns: 


U = [U r U m - r ], where U r = [u\ ••- u r ] 
V = [V r V n - r ], where V r = [\\ ••- \ r ] 


Then U r is m x r and V r is n x r. (To simplify notation, we consider U m - r or V n _ r 
even though one of them may have no columns.) Then partitioned matrix multiplication 
shows that 


A = [U r U m - r ] 


D 

o" 


0 

0 

v n L r _ 


U r DV r ' 


(9) 


This factorization of A is called a reduced singular value decomposition of A. Since 
the diagonal entries in D are nonzero, D is invertible. The following matrix is called 
the pseudoinverse (also, the Moore-Penrose inverse) of A: 

A + = V r D 一 1 U r T (10) 


Supplementary Exercises 12-14 at the end of the chapter explore some of the properties 
of the reduced singular value decomposition and the pseudoinverse. ■ 


EXAMPLE 8 (Least-Squares Solution) Given the equation Ax = b, use the pseu¬ 
doinverse of A in (10) to define 

x = A + b = V r D 一 ％ T b 


Then, from the SVD in (9), 

Ax= (U r DV r T )(V r D-'U t T h) 

=U r DD~ x Ujb Because V, T V r = I r 
=U r U t T b 

It follows from (5) that U r U f T b is the orthogonal projection b of b onto Col A. (See 
Theorem 10 in Section 6.3.) Thus x is a least-squares solution of Ax = b. In fact, this x 
has the smallest length among all least-squares solutions of Ax = b. See Supplementary 
Exercise 14. ■ 


i— NUMERICAL NOTE - 

Examples 1-4 and the exercises illustrate the concept of singular values and 
suggest how to perform calculations by hand. In practice, the computation of 
A t A should be avoided, since any errors in the entries of A are squared in the 
entries of A T A. There exist fast iterative methods that produce the singular values 
and singular vectors of A accurately to many decimal places. 


Further Reading 

Horn, Roger A., and Charles R. Johnson, Matrix Analysis (Cambridge: Cambridge 
University Press, 1990). 

Long, Cliff, “Visualization of Matrix Singular Value Decomposition.” Mathematics 
Magazine 56 (1983) ， pp. 161-167. 
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Find an SVD of each matrix in Exercises 5-12. [Hint: In 


Exercise 11, one choice for U is 


.In 


Exercise 12, one column of U can be 


1/^6 

- 2/^6 

l/>/6 


3. 


V6 

0 


•s/6 


V3 

0 




4. 


5. 


7. 


9. 


11 . 


-3 1 

6 -2 
6 -2 


6 . 


8 . 


10 . 


3 

2 _ 

- 2 . 

-1 

0 


12 . 


13. Find the SVD of A 


1 
0 
-1 

3 2 

2 3 


[Hint: Work with A T .] 


14. In Exercise 7, find a unit vector x at which ^4x has maximum 
length. 

15. Suppose the factorization below is an SVD of a matrix A, 
with the entries in U and V rounded to two decimal places. 


Find the singular values of the matrices in Exercises 1-4. 



.40 

-.78 

.47 


7.10 

0 

0 

A = 

.37 

-.33 

-.87 


0 

3.10 

0 


_-.84 

一 .52 

-.16_ 


0 

0 

0 


".30 

一 .51 

-.81" 





•76 

.58 


.64 

-.58 


-.12 

.58 


a. What is the rank of A1 

b. Use this decomposition of A, with no calculations, to 
write a basis for Col A and a basis for Nul A. [Hint: First 
write the columns of V.] 

16. Repeat Exercise 15 for the following SVD of a 3 x 4 matrix 
A: 


In Exercises 17-24, A is an m x n matrix with a singular value 
decomposition A = UEV T , where U is an m x m orthogonal 
matrix, S is an m x « “diagonal” matrix with r positive entries 
and no negative entries, and V is an 72 x « orthogonal matrix. 
Justify each answer. 

17. Suppose A is square and invertible. Find a singular value 
decomposition of A~ l . 

18. Show that if 4 is square, then | det^4| is the product of the 
singular values of A. 

19. Show that the columns of V are eigenvectors of A T A, the 
columns of U are eigenvectors of AA T , and the diagonal 
entries of S are the singular values of A. [Hint: Use the 
SVD to compute A T A and AA T .] 

20. Show that if A is an n x n positive definite matrix, then an 


orthogonal diagonalization A : 
decomposition of A. 


PDP 1 is a singular value 


.66 

-.03 

-.35 

.66 

.13 

-.90 

-.39 

—.13 

.65 

.08 

-.16 

-.73 

.34 

.42 

-.84 

-.08 


-.86 

-.11 

-.50 


12.48 

0 

0 

0 

.31 

•68 

-.67 


0 

6.34 

0 

0 

.41 

-.73 

-.55 


0 

0 

0 

0 


2 . 


Moler, C. B., and D. Morrison, Singular Value Analysis of Cryptograms. Amer. Math. 
Monthly 90 (1983) ， pp. 78-87. 

Strang, Gilbert, Linear Algebra and Its Applications, 4th ed. (Belmont, CA: Brooks/ 
Cole, 2005). 

Watkins, David S., Fundamentals of Matrix Computations (New York: Wiley, 1991 )， 
pp. 390-398,409421. 


PRACTICE PROBLEM 

WEB Given a singular value decomposition, A = U'EV 7 , find an SVD of A T . How are the 
- singular values of A and A T related? 

7.4 EXERCISES 


3 3 3 
/ / / 



3 3 3 
/ / / 



3 3 3 
/ / / 

12 2 
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21. Show that if P is an orthogonal m x tn matrix, then PA has 
the same singular values as A. 

22. Justify the statement in Example 2 that the second singular 
value of a matrix A is the maximum of ||^4x|| as x varies 
over all unit vectors orthogonal to Vi, with Vi a right singular 
vector corresponding to the first singular value of A. [Hint: 
Use Theorem 7 in Section 7.3.] 

23. Let 17 = [Ui … u m ] and V = [\\ ••- \ n ], where the 

u, and V/ are as in Theorem 10. Show that 

A = o"iUiv[ + a 2 U2V[ -i - h a r u ? .v, r . 

24. Using the notation of Exercise 23, show that A T Uj = GfWj 
for 1 < y < r = rank 乂 . 

25. Let r : > M. m be a linear transformation. Describe how 


[M] Compute an SVD of each matrix in Exercises 26 and 27. 
Report the final matrix entries accurate to two decimal places. Use 
the method of Examples 3 and 4. 


26. 


27. 


A = 


A = 


18 

13 

-4 

4 

2 

19 

-4 

12 

14 

11 

-12 

8 

-2 

21 

4 

8 


6-8-4 5 -4 

2 7-5-6 4 

0 - 1-8 2 2 

-1 -2 4 4 -8 


28. [M] Compute the singular values of the 4x4 matrix in 
Exercise 9 in Section 2.3, and compute the condition number 

ai/a4. 


to find a basis B for R” and a basis C for R m such that the 
matrix for T relative to B and C is an m x « “diagonal” 
matrix. 


29. [M] Compute the singular values of the 5x5 matrix in Ex¬ 
ercise 10 in Section 2.3, and compute the condition number 
o"i M. 


SOLUTION TO PRACTICE PROBLEM 

If 乂 = t/SK r , where 1 ： is mxn, then A T = (V T ) T ， Z T U T = V1 ： T U T . This is an 
SVD of A t because V and U are orthogonal matrices and is an n x m “diagonal” 
matrix. Since S and TJ have the same nonzero diagonal entries, A and A T have the 
same nonzero singular values. [Note: If ^4 is 2 x then AA T is only 2x2 and its 
eigenvalues may be easier to compute (by hand) than the eigenvalues of A T A.] 


7.5 APPLICATIONS TO IMAGE PROCESSING AND STATISTICS 

The satellite photographs in this chapter’s introduction provide an example of multidi¬ 
mensional, or multivariate, data — information organized so that each datum in the data 
set is identified with a point (vector) in W 1 . The main goal of this section is to explain a 
technique, called principal component analysis ， used to analyze such multivariate data. 
The calculations will illustrate the use of orthogonal diagonalization and the singular 
value decomposition. 

Principal component analysis can be applied to any data that consist of lists of 
measurements made on a collection of objects or individuals. For instance, consider a 
chemical process that produces a plastic material. To monitor the process, 300 samples 
are taken of the material produced, and each sample is subjected to a battery of eight 
tests, such as melting point, density, ductility, tensile strength, and so on. The laboratory 
report for each sample is a vector in R 8 , and the set of such vectors forms an 8 x 300 
matrix, called the matrix of observations. 

Loosely speaking, we can say that the process control data are eight-dimensional. 
The next two examples describe data that can be visualized graphically. 

EXAMPLE 1 An example of two-dimensional data is given by a set of weights and 
heights of N college students. Let Xy denote the observation vector in M 2 that lists the 
weight and height of the y th student. If w denotes weight and h height, then the matrix 
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FIGURE 2 

A scatter plot of spectral data for a 
satellite image. 


h 


w 


FIGURE 3 

Weight-height data in 
mean-deviation form. 


of observations has the form 

W\ W2 ••• wn 

_ h x h 2 ••• h N _ 

个 t 个 

Xi X 2 X N 

The set of observation vectors can be visualized as a two-dimensional scatter plot. See 

Fig. 1. ■ 

h 


- w 

FIGURE 1 A scatter plot of observation 
vectors Xi,... ， X^. 


EXAMPLE 2 The first three photographs of Railroad Valley, Nevada, shown in the 
chapter introduction, can be viewed as one image of the region, with three spectral 
components, because simultaneous measurements of the region were made at three 
separate wavelengths. Each photograph gives different information about the same 
physical region. For instance, the first pixel in the upper-left corner of each photograph 
corresponds to the same place on the ground (about 30 meters by 30 meters). To each 
pixel there corresponds an observation vector in R 3 that lists the signal intensities for 
that pixel in the three spectral bands. 

Typically, the image is 2000 x 2000 pixels, so there are 4 million pixels in the 
image. The data for the image form a matrix with 3 rows and 4 million columns 
(with columns arranged in any convenient order). In this case, the “multidimensional” 
character of the data refers to the three spectral dimensions rather than the two spatial 
dimensions that naturally belong to any photograph. The data can be visualized as a 
cluster of 4 million points in M 3 , perhaps as in Fig. 2. ■ 

Mean and Covariance 

To prepare for principal component analysis, let [Xi … X# ] be a x TV matrix of 
observations, such as described above. The sample mean, M, of the observation vectors 
Xi,..., Xat is given by 

M= … + X#) 

For the data in Fig. 1, the sample mean is the point in the “center” of the scatter plot. 
For k = 1,..., let 

=X^-M 

The columns of the p x N matrix 

B = [Xi X 2 … X N ] 

have a zero sample mean, and B is said to be in mean -deviation form. When the 
sample mean is subtracted from the data in Fig. 1, the resulting scatter plot has the form 
in Fig. 3. 










426 CHAPTER 7 Symmetric Matrices and Quadratic Forms 

The (sample) covariance matrix is the p 乂 p matrix S defined by 

1 T 

S = - BB t 

N - 1 

Since any matrix of the form BB T is positive semidefinite, so is S. (See Exercise 25 in 
Section 7.2 with B and B T interchanged.) 

EXAMPLE 3 Three measurements are made on each of four individuals in a random 
sample from a population. The observation vectors are 


Xi = 

"1" 

2 

, x 2 = 

4" 

2 

， x 3 = 

"7" 

8 

， x 4 = 

"8" 

4 


1 


13 


1 


5 


Compute the sample mean and the covariance matrix. 


SOLUTION The sample mean is 


M 


4 

2 

13 


1 


20 

16 

20 


Subtract the sample mean from Xi, • • •, X 4 to obtain 


Xi = 

"-4" 

-2 

,X 2 = 

"-1" 

-2 

,X 3 = 

2 " 

4 

,X 4 = 

"3" 

0 


-4 


8 


-4 


0 


and 


B 


-4-12 3 

-2-2 4 0 
-4 8-4 0 


The sample covariance matrix is 


-4-1 2 

-2-2 4 

-4 8 -4 


-4-2-4 
-1 -2 8 

2 4-4 

3 0 0 


30 

18 

0 " 


"10 

6 

0 " 

18 

24 

-24 

= 

6 

8 

-8 

0 

-24 

96 


0 

-8 

32 


■ 


To discuss the entries in S = [^y], let X represent a vector that varies over the set 
of observation vectors and denote the coordinates of X by X\,..., x p . Then X\, for 
example, is a scalar that varies over the set of first coordinates of Xi,..., X^. For 
j = l,..., p, the diagonal entry Sjj in S is called the variance of Xj. 

The variance of Xj measures the spread of the values of Xj. (See Exercise 13.) In 
Example 3, the variance of x\ is 10 and the variance of X 3 is 32. The fact that 32 is more 
than 10 indicates that the set of third entries in the response vectors contains a wider 
spread of values than the set of first entries. 

The total variance of the data is the sum of the variances on the diagonal of S. In 
general, the sum of the diagonal entries of a square matrix S is called the trace of the 
matrix, written tr(5). Thus 


{total variance} = tr(5 , ) 
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The entry Sij in S for i ^ j is called the covariance of Xi and Xj. Observe that 
in Example 3, the covariance between x\ and is 0 because the (1 ， 3)-entry in S is 
0. Statisticians say that X\ and are uncorrelated. Analysis of the multivariate data 
in Xi, ..., Xjv is greatly simplified when most or all of the variables X\,... ,x p are 
uncorrelated, that is, when the covariance matrix of Xi, ... ,X^ is diagonal or nearly 
diagonal. 

Principal Component Analysis 

For simplicity, assume that the matrix [Xi … X# ] is already in mean-deviation 
form. The goal of principal component analysis is to find an orthogonal p 乂 p matrix 
P = [ui • • • Up ] that determines a change of variable, X = PY, or 


"^1" 

又 2 

=[ui U 2 

… Up] 

~ y \" 

yi 

- X P - 



.y P . 


with the property that the new variables yi,... ,y p are uncorrelated and are arranged in 
order of decreasing variance. 

The orthogonal change of variable X = PY means that each observation vector 
X/c receives a “new name/’ Y^, such that X ^； = PYk. Notice that is the coordi¬ 
nate vector of with respect to the columns of P, and = P~ l Xk = P T X/ C for 
k = \ ，…， N • 

It is not difficult to verify that for any orthogonal P, the covariance matrix of 
Yi,..., Y# is P T SP (Exercise 11). So the desired orthogonal matrix P is one that 
makes P T SP diagonal. Let Z) be a diagonal matrix with the eigenvalues X\,... ,X P 
of S on the diagonal, arranged so that A i > A 2 > ••- > > 0, and let P be an 

orthogonal matrix whose columns are the corresponding unit eigenvectors Ui,...,u p . 
Then S = PDP T and P T SP = D. 

The unit eigenvectors ui,..., of the covariance matrix S are called the principal 
components of the data (in the matrix of observations). The first principal component 
is the eigenvector corresponding to the largest eigenvalue of S, the second principal 
component is the eigenvector corresponding to the second largest eigenvalue, and so 
on. 

The first principal component ui determines the new variable y\ in the following 
way. Let Ci,..., be the entries in ui. Since uf is the first row of P T , the equation 
Y = P T X shows that 

ji = u[X = C\X\ + C 2 x 2 H - 1- C p x p 

Thus 3 ； 1 is a linear combination of the original variables X\,..., x p , using the entries in 
the eigenvector ui as weights. In a similar fashion, U 2 determines the variable y 2 , and 
so on. 

EXAMPLE 4 The initial data for the multispectral image of Railroad Valley (Ex¬ 
ample 2) consisted of 4 million vectors in R 3 . The associated covariance matrix is 1 

'2382.78 2611.84 2136.20" 

S = 2611.84 3106.47 2553.90 

2136.20 2553.90 2650.71 


1 Data for Example 4 and Exercises 5 and 6 were provided by Earth Satellite Corporation, Rockville, 
Maryland. 
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Find the principal components of the data, and list the new variable determined by the 
first principal component. 

SOLUTION The eigenvalues of S and the associated principal components (the unit 
eigenvectors) are 


Ai = 7614.23 A 2 = 427.63 A 3 = 98.10 


'.5417" 


"-.4894" 


".6834" 

.6295 

U2 = 

-.3026 

U 3 = 

-.7157 

.5570 


.8179 


.1441 


Using two decimal places for simplicity, the variable for the first principal component 
is 


y\ = .54xi + .63^2 + .56^3 

This equation was used to create photograph (d) in the chapter introduction. The 
variables X\,X 2 , X 3 are the signal intensities in the three spectral bands. The values of Xi, 
converted to a gray scale between black and white, produced photograph (a). Similarly, 
the values of X 2 and X 3 produced photographs (b) and (c), respectively. At each pixel in 
photograph (d), the gray scale value is computed from y\, a weighted linear combination 
of X\,X 2 , X 3 . In this sense, photograph (d) “displays” the first principal component of 
the data. ■ 

In Example 4, the covariance matrix for the transformed data, using variables y\, 

y2, y3, is 



"7614.23 

0 

0 

D = 

0 

427.63 

0 


0 

0 

98.10 


Although D is obviously simpler than the original covariance matrix S, the merit of 
constructing the new variables is not yet apparent. However, the variances of the 
variables y\, y 2 , J 3 appear on the diagonal of D, and obviously the first variance in 
D is much larger than the other two. As we shall see, this fact will permit us to view 
the data as essentially one-dimensional rather than three-dimensional. 


Reducing the Dimension of Multivariate Data 

Principal component analysis is potentially valuable for applications in which most of 
the variation, or dynamic range, in the data is due to variations in only a few of the new 
variables, y\,..., y p . 

It can be shown that an orthogonal change of variables, X = PY, does not change 
the total variance of the data. (Roughly speaking, this is true because left-multiplication 
by P does not change the lengths of vectors or the angles between them. See Exercise 
12.) This means that if = PDP T , then 


[total variance) 


j total variance I 
[of 


tr(D) = Ai + ••• + A, 


The variance of yj is Xj, and the quotient Xj / tr(*S) measures the fraction of the total 
variance that is “explained” or “captured” by yj. 


EXAMPLE 5 Compute the various percentages of variance of the Railroad Valley 
multispectral data that are displayed in the principal component photographs, (d)-(f), 
shown in the chapter introduction. 
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SOLUTION The total variance of the data is 


tr(D) = 7614.23 + 427.63 + 98.10 = 8139.96 

[Verify that this number also equals tr(*S).] The percentages of the total variance 
explained by the principal components are 


First component 


Second component Third component 


7614.23 

8139.96 


= 93.5% 


427.63 

8139.96 


= 5.3% 


98.10 

8139.96 


= 1 . 2 % 


In a sense, 93.5% of the information collected by Landsat for the Railroad Valley region 
is displayed in photograph (d), with 5.3% in (e) and only 1.2% remaining for (f). ■ 

The calculations in Example 5 show that the data have practically no variance in 
the third (new) coordinate. The values of are all close to zero. Geometrically, the 
data points lie nearly in the plane = 0, and their locations can be determined fairly 
accurately by knowing only the values of y\ and In fact, also has relatively small 
variance, which means that the points lie approximately along a line, and the data are 
essentially one-dimensional. See Fig. 2, in which the data resemble a popsicle stick. 


Characterizations of Principal Component Variables 

If y\,... ,y p arise from a principal component analysis of a p x N matrix of obser¬ 
vations, then the variance of y\ is as large as possible in the following sense: If u is 
any unit vector and if y = u T X, then the variance of the values of j as X varies over 
the original data Xi, • • •, turns out to be u T Su. By Theorem 8 in Section 7.3, the 
maximum value of u T Su, over all unit vectors u, is the largest eigenvalue X\ of S, and 
this variance is attained when u is the corresponding eigenvector ui. In the same way, 
Theorem 8 shows that has maximum possible variance among all variables y = u r X 
that are uncorrelated with y\. Likewise, J 3 has maximum possible variance among all 
variables uncorrelated with both y\ and y 2 , and so on. 


i— NUMERICAL NOTE - 

The singular value decomposition is the main tool for performing principal com¬ 
ponent analysis in practical applications. If B is 3. p x N matrix of observations 
in mean-deviation form, and if A = (1/VA^ — 1) B T , then A T A is the covariance 
matrix, S. The squares of the singular values of A are the p eigenvalues of S, 
and the right singular vectors of A are the principal components of the data. 

As mentioned in Section 7.4, iterative calculation of the SVD of A is faster 
and more accurate than an eigenvalue decomposition of S. This is particularly 
true, for instance, in the hyperspectral image processing (with p = 224) men¬ 
tioned in the chapter introduction. Principal component analysis is completed in 
seconds on specialized workstations. 


Further Reading 

Lillesand, Thomas M.，and Ralph W. Kiefer, Remote Sensing and Image Interpretation, 
4th ed. (New York: John Wiley, 2000). 
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PRACTICE PROBLEMS 


The following table lists the weights and heights of five boys: 


Boy 

#1 

#2 

#3 

#4 

#5 

Weight (lb) 

120 

125 

125 

135 

145 

Height (in.) 

61 

60 

64 

68 

72 


1. Find the covariance matrix for the data. 

2. Make a principal component analysis of the data to find a single size index that 
explains most of the variation in the data. 


7.5 EXERCISES 


In Exercises 1 and 2, convert the matrix of observations to mean- 
deviation form, and construct the sample covariance matrix. 


19 22 6 3 2 20 

12 6 9 15 13 5 


2 . 


5 2 6 7 3 

11 6 8 15 11 


3. Find the principal components of the data for Exercise 1. 


9. Suppose three tests are administered to a random sample 
of college students. Let Xi,..., be observation vectors 
in R 3 that list the three scores of each student, and for 
j = 1,2, 3, let Xj denote a student’s score on the yth exam. 
Suppose the covariance matrix of the data is 

"5 2 0" 

S = 2 6 2 

0 2 7 


4. Find the principal components of the data for Exercise 2. 

5. [M] A Landsat image with three spectral components was 
made of Homestead Air Force Base in Florida (after the 
base was hit by hurricane Andrew in 1992). The covariance 
matrix of the data is shown below. Find the first principal 
component of the data, and compute the percentage of the 
total variance that is contained in this component. 

' 164.12 32.73 81.04" 

S = 32.73 539.44 249.13 

_ 81.04 249.13 189.11 _ 

6 . [M] The covariance matrix below was obtained from a 

Landsat image of the Columbia River in Washington, using 
data from three spectral bands. Let xi, X 2 , X 3 denote the 
spectral components of each pixel in the image. Find a 
new variable of the form = C\X\ + + C 3 X 3 that has 

maximum possible variance, subject to the constraint that 
Cj + cf + = 1. What percentage of the total variance in 

the data is explained by y\l 

~ 29.64 18.38 5.00" 

S = 18.38 20.82 14.06 

5.00 14.06 29.21 


Let y be an “index” of student performance, with y = 
c\X\ + C2X2 + C3X3 and c\c\c\ — 1. Choose ci, C 2 , C3 
so that the variance of y over the data set is as large as 
possible. [Hint: The eigenvalues of the sample covariance 
matrix are A = 3,6, and 9.] 


10 . 


[M] Repeat Exercise 9 with S = 


5 

4 

2 


4 2 

11 4 

4 5 


11. Given multivariate data X ls ...,X jv (in in mean- 
deviation form, let be a /? x /7 matrix, and define 
X k = P T X k fork = 


a. Show that Yj ,... are in mean-deviation form. [Hint: 
Let w be the vector in with a 1 in each entry. Then 
[Xj ... X# ] w = 0 (the zero vector in IR 尸 ) .] 


b. Show that if the covariance matrix of Xi, …， is S, 
then the covariance matrix of Yi,..., is P T SP• 


12. Let X denote a vector that varies over the columns of a p x N 
matrix of observations, and let 尸 be a x orthogonal 
matrix. Show that the change of variable X = PY does not 
change the total variance of the data. [Hint: By Exercise 11, 
it suffices to show that tr (P T SP) = tr (S). Use a property 
of the trace mentioned in Exercise 25 in Section 5.4.] 


7. Let Xi,X 2 denote the variables for the two-dimensional 
data in Exercise 1. Find a new variable y\ of the form 
y\ = c\X\ + C2X2, with + c| = 1 , such that y\ has max¬ 
imum possible variance over the given data. How much of 
the variance in the data is explained by y\l 

8 . Repeat Exercise 7 for the data in Exercise 2. 


13. The sample covariance matrix is a generalization of a formula 
for the variance of a sample of N scalar measurements, say, 
ti, ... ,t^. If m is the average of ...,then the sample 
variance is given by 

k=l 
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Show how the sample covariance matrix, S, defined prior 
to Example 3, may be written in a form similar to (1). 
[Hint: Use partitioned matrix multiplication to write S as 


1/(7V — 1) times the sum of N matrices of size p x p. For 
I < k < N, write X 灸 一 M in place of X^.] 


SOLUTIONS TO PRACTICE PROBLEMS 


1. First arrange the data in mean-deviation form. The sample mean vector is easily 
「 1301 

seen to be M = rc . Subtract M from the observation vectors (the columns in 

OJ 

the table) and obtain 


-10-5-5 5 15 

-4 -5-13 7 


Then the sample covariance matrix is 


S = 



-10-5-5 

-4-5-1 


-10 -4 



15 7 


"400 

190' 


"100.0 

47.5" 

190 

100 


_ 47.5 

25.0 _ 


2. The eigenvalues of S are (to two decimal places) 


Ai = 123.02 and X 2 = 1.98 


The unit eigenvector corresponding to X\ is u = 


.900 

.436 


(Since S is 2 x 2, the 


computations can be done by hand if a matrix program is not available.) For the size 


index ，set 

y = .900w + .436 石 


where w and h are weight and height, respectively, in mean-deviation form. The 
variance of this index over the data set is 123.02. Because the total variance is 
tr(5) = 100 + 25 = 125, the size index accounts for practically all (98.4%) of the 
variance of the data. 

The original data for Practice Problem 1 and the line determined by the first 
principal component u are shown in Fig. 4. (In parametric vector form, the line is 
x = M + ^u.) It can be shown that the line is the best approximation to the data, 


h 

75- 
70- 
65- 

Inches 

60-- 
55-- 

- % ~~I - 1 - 1 - 1 —— w 

120 130 140 150 

Pounds 

FIGURE 4 An orthogonal regression line determined by the 
first principal component of the data. 
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in the sense that the sum of the squares of the orthogonal distances to the line is 
minimized. In fact, principal component analysis is equivalent to what is termed 
orthogonal regression, but that is a story for another day. Perhaps we’ll meet again. 


CHAPTER 7 SUPPLEMENTARY EXERCISES 


1. Mark each statement True or False. Justify each answer. In 

each part, A represents rnn x n matrix. 

a. If A is orthogonally diagonalizable, then A is symmetric. 

b. If A is an orthogonal matrix, then A is symmetric. 

c. If A is an orthogonal matrix, then ||^4x|| = ||x|| for all x 
in R n . 

d. The principal axes of a quadratic form x T Ax can be the 
columns of any matrix P that diagonalizes A. 

e. If 尸 is an « x « matrix with orthogonal columns, then 
P T = P 一 1 . 

f. If every coefficient in a quadratic form is positive, then 
the quadratic form is positive definite. 

g. If x^x > 0 for some x, then the quadratic form x T Ax is 
positive definite. 

h. By a suitable change of variable, any quadratic form can 
be changed into one with no cross-product term. 

i. The largest value of a quadratic form x r Ax, for ||x|| = 1, 
is the largest entry on the diagonal of A. 

j. The maximum value of a positive definite quadratic form 
x^x is the greatest eigenvalue of A. 

k. A positive definite quadratic form can be changed into 
a negative definite form by a suitable change of variable 
x = Pu, for some orthogonal matrix P . 

l. An indefinite quadratic form is one whose eigenvalues 
are not definite. 

m. If 尸 is an n x « orthogonal matrix, then the change of 
variable x = Pu transforms x T Ax into a quadratic form 
whose matrix is P~ l AP. 

n. ]f U is m x n with orthogonal columns, then UU T x is 
the orthogonal projection of x onto Col U . 

o. If 5 is m x 7i and x is a unit vector in IR n , then || 5x|| < ai, 
where 0 \ is the first singular value of B . 

p. A singular value decomposition of an m x « matrix B 
can be written as B = PT^Q, where P is an m x m 
orthogonal matrix, g is an /i x n orthogonal matrix, and 
S is an m x n “diagonal” matrix. 

q. If ^4 is « x /z, then A and A T A have the same singular 
values. 

2. Let {ui,... ,u„} be an orthonormal basis for R n , and let 

, A n be any real scalars. Define 

A = AiUjuf -\ - h X n u n ul 

a. Show that A is symmetric. 


b. Show that X\,... ,X n are the eigenvalues of A. 

3. Let A be an n x n symmetric matrix of rank r. Explain why 
the spectral decomposition of A represents A as the sum of 
r rank 1 matrices. 

4. Let Abe an n x n symmetric matrix. 

a. Show that (Col 4) 丄 =Nul A. [Hint: See Section 6.1.] 

b. Show that each y in can be written in the form y = 
y + z, with y in Col A and z in Nul A. 

5. Show that if y is an eigenvector of an n x n matrix A and y 
corresponds to a nonzero eigenvalue of A, then y is in Col A. 
[Hint: Use the definition of an eigenvector.] 

6. Let A be an n x n symmetric matrix. Use Exercise 5 and 
an eigenvector basis for R n to give a second proof of the 
decomposition in Exercise 4(b). 

7. Prove that an n x n matrix A is positive definite if and only 
if A admits a Cholesky factorization, namely, A = R T R for 
some invertible upper triangular matrix R whose diagonal 
entries are all positive. [Hint: Use a QR factorization and 
Exercise 26 in Section 7.2.] 

8. Use Exercise 7 to show that if A is positive definite, then 
A has an LU factorization, A = LU ， where LJ has positive 
pivots on its diagonal. (The converse is true, too.) 

If ^4 is m x «, then the matrix G = A T A is called the Gram matrix 
of A. In this case, the entries of G are the inner products of the 
columns of A. (See Exercises 9 and 10.) 

9. Show that the Gram matrix of any matrix A is positive 
semidefinite, with the same rank as A. (See the Exercises 
in Section 6.5.) 

10. Show that if an n x n matrix G is positive semidefinite and 
has rank r , then G is the Gram matrix of some r x n matrix 
A. This is called a rank-revealing factorization of G. [Hint: 
Consider the spectral decomposition of G, and first write G 
as BB t for an /? x r matrix B.] 

11. Prove that any n y. n matrix A admits a polar decomposition 
of the form A = PQ, where 尸 is an « x w positive semidefi¬ 
nite matrix with the same rank as A and where Q ismn x n 
orthogonal matrix. [Hint: Use a singular value decomposi¬ 
tion, A = U'EV 7 , and observe that A = (U'ZU T )(UV T ).'\ 
This decomposition is used, for instance, in mechanical en¬ 
gineering to model the deformation of a material. The matrix 
P describes the stretching or compression of the material (in 
the directions of the eigenvectors of P), and Q describes the 
rotation of the material in space. 
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Exercises 12-14 concern an m x « matrix A with a reduced sin¬ 
gular value decomposition, A = U r DVj ^ and the pseudoinverse 
A + = V r D~ l U r T . 

12. Verify the properties of 

a. For each y in E m , AA^y is the orthogonal projection of 
y onto Col A. 

b. For each x in R n , Ax is the orthogonal projection of x 

onto Row A. 

c. AA + A = A and A^AA^ = 

13. Suppose the equation Ax = b is consistent, and let 
x~*~ = v4 + b. By Exercise 23 in Section 6.3, there is exactly 
one vector p in Row A such that Ap = b. The following 
steps prove that x+ = p and is the minimum length 
solution of Ax = b. 

a. Show that x+ is in Row A. [Hint: Write b as Ax for some 
x, and use Exercise 12.] 

b. Show that x + is a solution of Ax = b. 

c. Show that if u is any solution of Ax = b, then 
||x + || < II u II, with equality only if u = x+. 


14. Given any b in R m , adapt Exercise 13 to show that ^4+ b is the 
least-squares solution of minimum length. [Hint: Consider 
the equation Ax = b, where b is the orthogonal projection of 
b onto Col A.] 

[M] In Exercises 15 and 16, construct the pseudoinverse of A. Be¬ 
gin by using a matrix program to produce the SVD of A, or, if that 
is not available, begin with an orthogonal diagonalization of A T A. 
Use the pseudoinverse to solve Ax = b, for b = (6, —1, —4,6), 
and let x be the solution. Make a calculation to verify that x 
is in Row A. Find a nonzero vector u in Nul A, and verify that 
pii < ||x + u||, which must be true by Exercise 13(c). 

_-3 -3 -6 6 1" 


0 0-1 1-1 
4 0-1-2 0 


6 0-3-6 0 
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8 


The Geometry of 
Vector Spaces 


INTRODUCTORY EXAMPLE 

The Platonic Solids 

In the city of Athens in 387 B.C., the Greek philosopher 
Plato founded an Academy, sometimes referred to as the 
world’s first university. While the curriculum included 
astronomy, biology, political theory, and philosophy, the 
subject closest to his heart was geometry. Indeed, inscribed 
over the doors of his academy were these words: “Let no 
one destitute of geometry enter my doors.” 

The Greeks were greatly impressed by geometric 
patterns such as the regular solids. A polyhedron is called 
regular if its faces are congruent regular polygons and all 
the angles at the vertices are equal. As early as 150 years 
before Euclid, the Pythagoreans knew at least three of the 
regular solids: the tetrahedron (4 triangular faces), the cube 
(6 square faces), and the octahedron (8 triangular faces). 
(See Fig. 1.) These shapes occur naturally as crystals of 
common minerals. There are only five such regular solids, 
the remaining two being the dodecahedron (12 pentagonal 
faces) and the icosahedron (20 triangular faces). 

Plato discussed the basic theory of these five solids 
in Book XIII of his Elements, and since then they have 
carried his name: the Platonic solids. 

For centuries there was no need to envision geometric 
objects in more than three dimensions. But nowadays 
mathematicians regularly deal with objects in vector spaces 


having four, five, or even hundreds of dimensions. It is not 
necessarily clear what geometrical properties one might 
ascribe to these objects in higher dimensions. 

For example, what properties do lines have in 2- 
space and planes have in 3-space that would be useful 
in higher dimensions? How can one characterize such 
objects? Sections 8.1 and 8.4 provide some answers. 
The hyperplanes of Section 8.4 will be important for 
understanding the multi-dimensional nature of the linear 
programming problems in Chapter 9. 

What would the analogue of a polyhedron “look 
like” in more than three dimensions? A partial answer 
is provided by two-dimensional projections of the four¬ 
dimensional object, created in a manner analogous to two- 
dimensional projections of a three-dimensional object. 
Section 8.5 illustrates this idea for the four-dimensional 
“cube” and the four-dimensional “simplex.” 

The study of geometry in higher dimensions not 
only provides new ways of visualizing abstract algebraic 
concepts, but also creates tools that may be applied in R 3 . 
For instance, Sections 8.2 and 8.6 include applications to 
computer graphics, and Section 8.5 outlines a proof (in 
Exercise 21) that there are only five regular polyhedra in 
M 3 . 
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FIGURE 1 The five Platonic solids. 


Most applications in earlier chapters involved algebraic calculations with subspaces 
and linear combinations of vectors. This chapter studies sets of vectors that can be 
visualized as geometric objects such as line segments, polygons, and solid objects. 
Individual vectors are viewed as points. The concepts introduced here are used in 
computer graphics, linear programming (in Chapter 9), and other areas of mathematics. * 1 

Throughout the chapter, sets of vectors are described by linear combinations, but 
with various restrictions on the weights used in the combinations. For instance, in 
Section 8.1, the sum of the weights is 1, while in Section 8.2, the weights are positive 
and sum to 1. The visualizations are in M 2 or R 3 , of course, but the concepts also apply 
to W l and other vector spaces. 

8.1 AFFINE COMBINATIONS 


An affine combination of vectors is a special kind of linear combination. Given vectors 
(or “points” ） vi,V 2 ,..., in and scalars c\,... ,c p , an affine combination of 
Vi, V 2 , ..., is a linear combination 


c\\\ H - h c p \ p 

such that the weights satisfy C\ + • • • + = 1. 


1 See Foley, van Dam, Feiner, and Hughes, Computer Graphics—Principles and Practice, 2nd edition 

(Boston: Addison-Wesley, 1996), pp. 1083-1112. That material also discusses coordinate-free “affine 

spaces.” 
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DEFINITION 


The set of all affine combinations of points in a set S is called the affine hull (or 
affine span) of S, denoted by aff S. 


The affine hull of a single point Vi is just the set {vi}, since it has the form c\\\ where 
C\ = 1. The affine hull of two distinct points is often written in a special way. Suppose 
y = ciVi + C 2 V 2 with a -h C 2 = 1. Write t in place of C 2 , so that c\ = l — C 2 = 1 — t. 
Then the affine hull of {vi,V 2 } is the set 

y = (1 —，)vi + t\ 2 , with ^ in R (1) 

This set of points includes Vi (when ， = 0) and \2 (when t = 1). If \2 = Vi, then (1) 
again describes just one point. Otherwise, (1) describes the line through Vi and V 2 . To 
see this, rewrite (1) in the form 

y = Vi + t (\2 — Vi) = p + ^u, with Mn R 

where p is Vi and u is V 2 — Vi. The set of all multiples of u is Span {u}, the line through 
u and the origin. Adding p to each point on this line translates Span {u} into the line 
through p parallel to the line through u and the origin. See Fig. 1. (Compare this figure 
with Fig. 5 in Section 1.5.) 



Figure 2 uses the original points Vi and \ 2 , and displays aff{vi,V 2 } as the line 
through Vi and V 2 . 



Notice that while the point y in Fig. 2 is an affine combination of Vi and V 2 , the point 
y — Vi equals t (\2 — Vi), which is a linear combination (in fact, a multiple) of V 2 — Vi. 
This relation between y and y — Vi holds for any affine combination of points, as the 
following theorem shows. 


A point y in R 77 is an affine combination of Vi,..., in W l if and only if y — Vi 
is a linear combination of the translated points V 2 — Vi, … ， 〜一 Vi. 


THEOREM 1 
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PROOF If y — Vi is a linear combination of V 2 — Vi,..., — Vi, there exist weights 

C 2 ,... ， c p such that 

y-Vi = c 2 (\2 - Vi) H - + c p (y p - Vi) (2) 

Then 

y = (1 - C 2 - c p )vi + c 2 v 2 + - h c p \ p (3) 

and the weights in this linear combination sum to 1. So y is an affine combination of 
. ,\ p . Conversely, suppose 

y = C { \1 + c 2 \2 H - + CpSp (4) 

where c\ + • • • + = 1. Since c\ = 1 — C 2 —- c p , equation (4) may be written 

as in (3)，and this leads to (2 )， which shows that y — Vi is a linear combination of 
y 2 -vi,...,Vp-yi. ■ 

In the statement of Theorem 1, the point Vi could be replaced by any of the other 
points in the list Vi,..., y^. Only the notation in the proof would change. 


EXAMPLE 1 


Let vi = 


_ r 


'2" 


T 


"-2" 

,and y = 

"4" 

2 

,v 2 = 

_5_ 

,v 3 = 

_3_ 

,V4 = 

2 

1 


If possible, write y as an affine combination of Vi, V 2 , V 3 , and V 4 . 


SOLUTION Compute the translated points 



'1' 


"O' 


"-3" 


3" 

V2 - Vi = 

3 

,V 3 - Vi = 

1 

,V4 - Vl = 

0 

， y-vi = 

-1 


To find scalars C 2 , C 3 , and c\ such that 

C 2 (v 2 - vi) + c 3 (v 3 - vi) + c 4 (v 4 - vi) = y-vi (5) 

row reduce the augmented matrix having these points as columns: 


"1 

0 

-3 3" 


"1 0 

-3 3' 

_3 

1 

0 -1 


0 1 

9 -10 


This shows that equation (5) is consistent, and the general solution is + 3, 

C 3 = — 9 c 4 — 10, with C 4 free. When C 4 = 0, 

y — vi = 3 (v 2 — vi) — 10 (v 3 — vi) + 0 (v 4 — vi) 


and 


As another example, take C\ 
y-vi = 


y = 8 vi + 3 v 2 — 10 v 3 
=1. Then C 2 = 6 and C 3 = —19, so 
6(V2 _ Vi) _ 19 (V 3 — Vl) + 1 (V4 _ Vi) 


and 


y = 13vi + 6 v 2 — 19v3 + V 4 


■ 


While the procedure in Example 1 works for arbitrary points Vi, V 2 ,..., in W 1 , 
the question can be answered more directly if the chosen points v z are a basis for W 1 . 
For example, let B = {bi,..., b, 2 } be such a basis. Then any y in R 77 is a unique linear 
combination of bi,..., b w . This combination is an affine combination of the b’s if and 
only if the weights sum to 1. (These weights are just the S-coordinates of y, as in Section 
4.4.) 


























8.1 Affine Combinations 439 


DEFINITION 


THEOREM 2 



4 


0 


5 


2 


"1" 

EXAMPLE 2 Letbi = 

0 

， b 2 = 

4 

,b3 = 

2 

Pi = 

0 

,and p 2 = 

2 


3 


2 


4 


0 


2 


The set B = {bi ， b 2 , b〗} is a basis for R 3 . Determine whether the points pj and p 2 are 
affine combinations of the points in B. 


SOLUTION Find the ^-coordinates of pj and p 2 . These two calculations can be com¬ 
bined by row reducing the matrix [bi b 2 b 3 pj p 2 ], with two augmented columns: 


一 4 

0 

5 

2 

r 


1 

0 

0 

-2 

2 

3 

0 

4 

2 

0 

2 

〜 

0 

1 

0 

-1 

2 

3 

3 

2 

4 

0 

2 


0 

0 

1 

2 

_1 


Read column 4 to build p 1? and read column 5 to build p 2 ： 

Pi = -2bi -b 2 + 2b 3 and p 2 = |bi + |b 2 - |b 3 

The sum of the weights in the linear combination for is —1，not 1, so is not an 
affine combination of the b’s. However, p 2 is an affine combination of the b’s, because 
the sum of the weights for p 2 is 1. ■ 


A set S is affine if p, q G 5* implies that (1 — f)p + tq e S for each real number 


Geometrically, a set is affine if whenever two points are in the set, the entire line 
through these points is in the set. (If S contains only one point, p, then the line through 
p and p is just a point, a “degenerate” line.) Algebraically, for a set S to be affine, 
the definition requires that every affine combination of two points of S belong to S. 
Remarkably, this is equivalent to requiring that S contain every affine combination of 
an arbitrary number of points of S. 


A set S is affine if and only if every affine combination of points of S lies in S. 
That is, S is affine if and only if S = aff S. 


PROOF Suppose that S is affine and use induction on the number m of points of S 
occurring in an affine combination. When m is 1 or 2, an affine combination of m points 
of S lies in S, by the definition of an affine set. Now, assume that every affine combina¬ 
tion of k or fewer points of S yields a point in S, and consider a combination ofk-\-l 

points. Take V/ in S for i = 1 ， •••，/:+ 1, and let y = C\\\ H - h c^k + Q+iV 々 +i ， 

where ci + • • • + = 1. Since the ’s sum to 1， at least one of them must not be 

equal to 1. By re-indexing the v, and q, if necessary, we may assume that Ck+\ ^ 1- 
Let t = c\ -\ - h Ck. Then t = l — Ck-\-\ ^ 0, and 

(C\ Ck \ 

y = (1 - C k+ x) (^yVi H - + —^kj + Cyt+iV^+l (6) 

By the induction hypothesis, the point z = {c\/ ^)vi + ... + (c" t)\jc is in S, since the 
coefficients sum to 1. Thus (6) displays y as an affine combination of two points in S, 
and so y G By the principle of induction, every affine combination of such points lies 
in S. That is, aff S C S. But the reverse inclusion, S C aff S, always applies. Thus, 
when S is affine, S = aff S. Conversely, if = aff S, then affine combinations of two 
(or more) points of S lie in S, so S is affine. ■ 
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DEFINITION 


THEOREM 3 


The next definition provides terminology for affine sets that emphasizes their close 
connection with subspaces of R n . 


A translate of a set S in W 1 by a vector p is the set S + p = {s + p : s e 5 1 }. 2 A flat 
in is a translate of a subspace of W l . Two flats are parallel if one is a translate 
of the other. The dimension of a flat is the dimension of the corresponding parallel 
subspace. The dimension of a set S, written as dim S, is the dimension of the 
smallest flat containing S. A line in W 1 is a flat of dimension 1. A hyperplane in 
is a flat of dimension n — 1. 


In R 3 , the proper subspaces 3 consist of the origin 0, the set of all lines through 
0, and the set of all planes through 0. Thus the proper flats in R 3 are points (zero¬ 
dimensional), lines (one-dimensional), and planes (two-dimensional), which may or 
may not pass through the origin. 

The next theorem shows that these geometric descriptions of lines and planes in R 3 
(as translates of subspaces) actually coincide with their earlier algebraic descriptions as 
sets of all affine combinations of two or three points, respectively. 


A nonempty set S is affine if and only if it is a flat. 


PROOF Suppose that S is affine. Let p be any fixed point in S and let W = S + (—p), 
so that 5 = + p. To show that 5 is a flat, it suffices to show that W is a subspace of 

W 1 . Since p is in S, the zero vector is in W. To show that W is closed under sums and 
scalar multiples, it suffices to show that if ui and U 2 are elements of W, then ui + tU 2 
is in W for every real t. Since Ui and U 2 are in W, there exist Si and S 2 in S such that 
Ui = Si — p and U 2 = S 2 — p. So, for each real t, 

ui + m 2 = (si - p) + r(s 2 -p) 

=(1 - Osi + f (Si + S 2 - p) - p 

Let y = Si + S 2 — p. Then y is an affine combination of points in S. Since S is affine, 
y is in 5 (by Theorem 2). But then (1 — f)Si + fy is also in S. So Ui + 响 is in 
S = W. This shows that W is a subspace of M w . Thus is a flat, because 
S = W + p. 

Conversely, suppose S is a flat. That is, + p for some p G R 72 and some 

subspace W. To show that S is affine, it suffices to show that for any pair Si and S 2 of 
points in S, the line through S\ and S 2 lies in S. By definition of W, there exist Ui and 
U 2 in W such that Si = Ui + p and S 2 = U 2 + p. So, for each real t, 

(1 - t)si + ts 2 = (1 - 0(ui + P) + t(u 2 + p) 

=(1 - t)u { +m 2 + p 

Since W is a subspace, (1 — f)ui + tU 2 e W and so (1 — t)s\ + ?S 2 e VF + p = 
Thus S is affine. ■ 


2 If p = 0, then the translate is just S itself. See Fig. 4 in Section 1.5. 

3 A subset ^4 of a set B is called a proper subset of B if A ^ B. The same condition applies to proper 
subspaces and proper flats in R n : they are not equal to 
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Theorem 3 provides a geometric way to view the affine hull of a set: it is the flat that 
consists of all the affine combinations of points in the set. For instance, Fig. 3 shows 
the points studied in Example 2. Although the set of all linear combinations of bi, b2, 
and b3 is all of R 3 , the set of all affine combinations is only the plane through bi, b2, 
and b3. Note that p 2 (from Example 2) is in the plane through bi, b2, and b 〗， while pj 
is not in that plane. Also, see Exercise 14. 

The next example takes a fresh look at a familiar set—the set of all solutions of a 
system Ax = b. 

EXAMPLE 3 Suppose that the solutions of an equation Ax = b are all of the form 

2l 厂 4_ 

and p = 


x = x^u + p, where u 


-3 


.Recall from Section 1.5 that this 


set is parallel to the solution set of Ax = 0 , which consists of all points of the form X 3 U. 
Find points Vi and \2 such that the solution set of Ax = b is aff {vi, V 2 }. 


SOLUTION The solution set is a line through p in the direction of u, as in Fig. 1. Since 
aff {vi, V 2 } is a line through vi and \ 2 , identify two points on the line x = x^u + p. Two 
simple choices appear when X 3 = 0 and X 3 = 1. That is, take Vi = p and v 2 = u + p, 
so that 



2 


4 


6 

y 2 = u + p = 

-3 

+ 

0 

= 

-3 


1 


-3 


-2 


In this case, the solution set is described as the set of all affine combinations of the form 



4" 


6 " 

x = (1 -x 3 ) 

0 

+ X 3 

-3 


-3 


-2 


Earlier, Theorem 1 displayed an important connection between affine combinations 
and linear combinations. The next theorem provides another view of affine combina¬ 
tions, which for R 2 and R 3 is closely connected to applications in computer graphics, 
discussed in the next section (and in Section 2.7). 


DEFINITION 


For y in M 72 , the standard homogeneous form of v is the point y = 


y in M n+1 . 


THEOREM 4 A point y in W 1 is an affine combination of Vi,..., in W 1 if and only if the 

homogeneous form of y is in Span {vi, ..., v^}. In fact, y = CiVi H - h c p \ p , 

with C\ -\ - \- c p = \, if and only if y = CiVi H - h c p \ p . 


PROOF A point y is in aff {vi,..., y if and only if there exist weights c\,... ,c p such 
that 


y 

1 

=Cl 

Vl 

1 

+ C2 

V2 

1 

+ ••• + & 

1 


This happens if and only if y is in Span {vi, V 2 , • • •, 


3 


1 


1 


"4" 

1 

,V2 = 

2 

， V 3 = 

7 

， and p = 

3 

1 


2 


1 


0 


EXAMPLE 4 LetVi 

orem 4 to write p as an affine combination of vi, \ 2 , and V 3 , if possible. 


■ 


.Use The- 
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SOLUTION Row reduce the augmented matrix for the equation 

•^lVi + X 2 \ 2 + X 3 \ 3 = p 

To simplify the arithmetic, move the fourth row of Vs to the top (equivalent to three 
row interchanges). After this, the number of arithmetic operations here is basically the 
same as the number needed for the method using Theorem 1. 

i i i r 
0-2-2 1 
0 16 2 
0 1 0 - 1 _ 

"1 0 0 1.5" 

0 10-1 
0 0 1 .5 

0 0 0 0 _ 

By Theorem 4, l.5\\ — V 2 + . 5 v 3 = p. See Fig. 4, which shows the plane that contains 
Vi, \ 2 , V 3 , and p (together with points on the coordinate axes). ■ 


[vi y 2 V3 P ] 〜 


1111 
3 114 

12 7 3 

12 10 


又 3 



FIGURE 4 


PRACTICE PROBLEM 








"4" 

_3_ 


1 

0 

,V 2 = 

—1 

2 

,V 3 = 

3 

1 

,and p = 

on graph paper, and 

be an affine combination of Vi, \ 2 , and V 3 . 

Then find the affine 


Plot the points Vi 


combination for p. [Hint: What is the dimension of aff {vi, \ 2 , V 3 }?] 


8.1 EXERCISES 


In Exercises 1-4, write y as an affine combination of the other 
points listed, if possible. 



1 


-2 


0 


3 


5 

Vi = 

2 

,v 2 = 

2 

， v 3 = 

4 

,V 4 = 

7 

，y = 

3 



1 


-1 


3 


5 

Vl = 

1 

,V 2 = 

2 

,V 3 = 

2 

,y = 

7 



-3 


0 


4 


17 

3. Vi = 

1 

,v 2 = 

4 

,v 3 = 

-2 

,y = 

1 


1 


-2 


6 


5 



1 


2 


4 


-3 

4. V!= 

2 

,v 2 = 

—6 

,v 3 = 

3 

，y = 

4 


0 


7 


1 


-4 


In Exercises 5 and 6, letbi = 

2 

1 

,b 2 = 

1 

0 

,b 3 = 

2 

-5 


1 


-2 


1 


and S = {bi,b 2 , b〗}. Note that 5 is an orthogonal basis for M 3 . 
Write each of the given points as an affine combination of the 
points in the set S, if possible. [Hint: Use Theorem 5 in Section 
6.2 instead of row reduction to find the weights.] 
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"3" 


6" 


0" 

5. a. = 

8 

4 

b. p 2 = 

-3 

3 

c. p 3 = 

-1 

-5 



0" 


1.5" 


5" 

6. a. = 

—19 

b. P 2 = 

-1.3 

c. p 3 = 

-4 


-5 


— .5 


0 


7. Let 



_r 


2" 


"-1" 

Vi = 

0 

3 

, V2 = 

-1 

0 

, V 3 = 

2 

1 


0 


4 


1 



5' 


"-9" 


"4" 


-3 


10 


2 

Pi = 

5 

， P2 = 

9 

， P3 = 

8 


3 


13_ 


5 

and S 

={Vl,V2,V 3 }. 

It can 

be shown that 


independent. 

a. Is p! in Span S? Is pj in aff S? 


is linearly 


b. Is p 2 in Span SI Is p 2 in aff 5? 

c. Is p 3 in Span S? Is p 3 in aff 5? 


8. Repeat Exercise 7 when 



_ r 


2" 


3" 


0 


1 


0 

Vi = 

3 

, v 2 = 

6 

, v 3 = 

12 


_-2_ 


_-5_ 


—6 


4" 


"-5" 



Pi = 

-1 

15 

， P 2 = 

3 

-8 

,and p 3 = 


-7 


6 




9. Suppose that the solutions of an equation Ax = b are all of 

r 4l f-3" 

the form x = X 3 U + p, where u = and p = . 

— 2 u 

Find points Vi and \2 such that the solution set of Ax = b is 
aff{vi,y 2 }. 

10. Suppose that the solutions of an equation Ax = b are all of 


the form x = x^u + p, where u = 

•5_ 

1 

and p = 

1 " 

-3 


-2 


4 


Find points Vi and V 2 such that the solution set of Ax = b is 
aff{vi,y 2 }. 


In Exercises 11 and 12, mark each statement True or False. Justify 
each answer. 

11. a. The set of all affine combinations of points in a set S is 
called the affine hull of S. 


b. If {bi,..., bjt} is a linearly independent subset of R n and 
if p is a linear combination of bi , …， b 人 -, then p is an 
affine combination of bi,..., b^. 

c. The affine hull of two distinct points is called a line. 

d. A flat is a subspace. 

e. A plane in R 3 is a hyperplane. 

12. a. If S = {x}, then aff 5 is the empty set. 

b. A set is affine if and only if it contains its affine hull. 

c. A flat of dimension 1 is called a line. 

d. A flat of dimension 2 is called a hyperplane. 

e. A flat through the origin is a subspace. 

13. Suppose {vi,V 2 ,V 3 } is a basis for R 3 . Show that 
Span{v 2 — Vi, V 3 — Vi} is a plane in R 3 . [Hint: What can 
you say about u and y when Span {u,y} is a plane?] 

14. Show that if {vi, V 2 , V 3 } is a basis for R 3 , then aff {vi, V 2 , V 3 } 
is the plane through Vi, V 2 , and V 3 . 

15. Let Abe an m x n matrix and, given b in R ,n , show that the 
set S of all solutions of Ax = b is an affine subset of R n . 

16. Let y e and let A: € R. Prove that S = {x G R n : x*v = k} 
is an affine subset of R n . 

17. Choose a set S of three points such that aff S is the plane in 
R 3 whose equation is = 5. Justify your work. 

18. Choose a set S of four distinct points in R 3 such that aff S is 
the plane 2x\ X 2 — 3^3 = 12. Justify your work. 

19. Let S be an affine subset of R”，suppose f '.W 1 ^ is a 
linear transformation, and let / (*S) denote the set of images 
{/(x) : x e S}. Prove that / (S) is an affine subset of R m . 

20. Let f : V ^ R m be a linear transformation, let T be an 
affine subset of R m , and let S = {x eR n : /(x) e T}. Show 
that S is an affine subset of R n . 


In Exercises 21-26, prove the given statement about subsets A 
and B of R n , or provide the required example in R 2 . A proof 
for an exercise may use results from earlier exercises (as well as 
theorems already available in the text). 

21. U A C B and B is affine, then aff ^4 C B. 

22. IfAcB, then aff j C aff 5. 

23. [(aff A) U (aff 5)] C aff (^4 U B). [Hint: To show that 
D U E C F, show that D C F and E C F.] 

24. Find an example in R 2 to show that equality need not hold in 
the statement of Exercise 23. [Hint: Consider sets A and B, 
each of which contains only one or two points.] 

25. aff (A fl 5) C (aff 乂 fl aff 5). 

26. Find an example in R 2 to show that equality need not hold in 
the statement of Exercise 25. 
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SOLUTION TO PRACTICE PROBLEM 


Since the points Vi, \ 2 , and V 3 are not collinear (that is, not on a single line), 
aff{vi, V 2 , V 3 } cannot be one-dimensional. Thus, aff {vi, V 2 , V 3 } must equal M 2 . To 
find the actual weights used to express p as an affine combination of Vi, V 2 , and V 3 , first 
compute 

"-2 

v 2 - Vi = 0 


v 3 - Vi 


and p — Vi 


To write p — Vi as a linear combination of \2 — \\ and V 3 — Vi ， row reduce the matrix 
having these points as columns: 


-2 2 3 1 

2 1 3j 〜 o 


2 


2 


Thus p — vi = — Vi) + 2 (v 3 — vi), which shows that 

p = (l - ^ - 2) vi + iy 2 + 2 v 3 = -|vi + \\i + 2v 3 


This expresses p as an affine combination of Vi, V 2 , and V 3 , because the coefficients sum 
to 1 . 

Alternatively, use the method of Example 3 and row reduce: 

r n I" 1 1 1 

V 1 V 2 V 3 P 〜 上 3 

1111 

L 1 1 」 0 2 1 



"l 

0 

0 

3 ■ 

2 

〜 

0 

1 

0 

1 

2 


_0 

0 

1 

2_ 


This shows that p = —\\\ + + 2 v 3 . 


8.2 AFFINE INDEPENDENCE 


This section continues to explore the relation between linear concepts and affine con¬ 
cepts. Consider first a set of three vectors in R 3 , say S = {vi, V 2 , V 3 }. If S is linearly 
dependent, then one of the vectors is a linear combination of the other two vectors. What 
happens when one of the vectors is an affine combination of the others? For instance, 
suppose that 

V 3 = (1 — 0^1 + ~ 2 , for some f in R. 


Then 


(1 — f)Vi + t\2 — V3 = 0. 

This is a linear dependence relation because not all the weights are zero. But more is 
true—the weights in the dependence relation sum to 0 : 

(i -0 + ^ + (-i) = o. 

This is the additional property needed to define affine dependence. 


DEFINITION An indexed set of points {vi,..., v^} in W 1 is affinely dependent if there exist 
real numbers Ci, ..., c p , not all zero, such that 

c\ -\ - \- c p = 0 and qvi H - h c p \ p = 0 ( 1 ) 

Otherwise, the set is affinely independent. 
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THEOREM 5 


An affine combination is a special type of linear combination, and affine depen¬ 
dence is a restricted type of linear dependence. Thus, each affinely dependent set is 
automatically linearly dependent. 

A set {vi} of only one point (even the zero vector) must be affinely independent 
because the required properties of the coefficients c, cannot be satisfied when there is 
only one coefficient. For {vi }，the first equation in (1) is just C\ = 0, and yet at least one 
(the only one) coefficient must be nonzero. 

Exercise 13 asks you to show that an indexed set {vi, V 2 } is affinely dependent if 
and only if Vi = V 2 . The following theorem handles the general case and shows how 
the concept of affine dependence is analogous to that of linear dependence. Parts (c) 
and (d) give useful methods for determining whether a set is affinely dependent. Recall 
from Section 8.1 that if v is in W\ then the vector y in R w+1 denotes the homogeneous 
form of v. 


Given an indexed set S = {vi,...,y^} in R w , with p > 2, the following state¬ 
ments are logically equivalent. That is, either they are all true statements or they 
are all false. 

a. S is affinely dependent. 

b. One of the points in S is an affine combination of the other points in S. 

c. The set {v 2 — Vi,..., — vi} inW 1 is linearly dependent. 

d. The set {vi,..., y^} of homogeneous forms in M w+1 is linearly dependent. 


PROOF Suppose statement (a) is true, and let ci,..., satisfy (1). By renaming the 
points if necessary, one may assume that C\ ^ 0 and divide both equations in (1) by C\, 
so that 1 + (C 2 /C 1 ) H - 1- (c p /c\) = 0 and 

Vi = (-c 2 /ci)v 2 +- h (-c p /ci)\p (2) 


Note that the coefficients on the right side of (2) sum to 1. Thus (a) implies (b). Now, 
suppose that (b) is true. By renaming the points if necessary, one may assume that 
Vi = c 2 \2 H - h C p \ p , where c 2 H - \- c p = l. Then 

(C2 H - h C P )\\ = c 2 \2 H - h c p \ p (3) 


and 


Ci(y2 - Vi) H - h C p (y p - Vi) = 0 


⑷ 


Not all of C 2 ,..., c p can be zero because they sum to 1. So (b) implies (c). 

Next, if (c) is true, then there exist weights C 2 , … ， c p , not all zero, such that (4) 
holds. Rewrite (4) as (3) and set ci = —(C 2 + • • • + c p ). Then c\ c p =0. Thus 

(3) shows that (1) is true. So (c) implies (a), which proves that (a), (b), and (c) are 
logically equivalent. Finally, (d) is equivalent to (a) because the two equations in (1) 
are equivalent to the following equation involving the homogeneous forms of the points 


in S : 


c\ 


vi 


+ ••• + 4 


y P 


0 

0 


■ 


In statement (c) of Theorem 5, Vi could be replaced by any of the other points in 
the list vi,..., \ p . Only the notation in the proof would change. So, to test whether a 
set is affinely dependent, subtract one point in the set from the other points, and check 
whether the translated set ofp—l points is linearly dependent. 









446 CHAPTER 8 The Geometry of Vector Spaces 


EXAMPLE 1 The affine hull of two distinct points p and q is a line. If a third point 
r is on the line, then {p, q, r} is an affinely dependent set. If a point s is not on the 
line through p and q, then these three points are not collinear and {p, q, s} is an affinely 
independent set. See Fig. 1. ■ 



FIGURE 1 {p, q, r} is affinely dependent. 



_r 


" 2 " 


" 0 " 


EXAMPLE 2 Let Vl = 

3 

7 

， v 2 = 

7 

6.5 

, V 3 = 

4 

7 

,and 5 = {vi,V 2 ,v 3 } 


Determine whether S is affinely independent. 


SOLUTION Compute y 2 — Vi = 

1 

4 

and y 3 — yj = 

"-1" 

1 


— .5 


0 


These two points 


are not multiples and hence form a linearly independent set, S f . So all statements in 
Theorem 5 are false, and S is affinely independent. Figure 2 shows S and the translated 
set S r . Notice that Span S f is a plane through the origin and aff S is a parallel plane 
through Vi, \ 2 , and V 3 . (Only a portion of each plane is shown here, of course.) ■ 



FIGURE 2 An affinely independent set {vi, V2, V3}. 


EXAMPLE 3 Letvj = 

1 

3 

,v 2 = 

2 

7 

,V 3 = 

0 

4 

,and V 4 = 

0 " 

14 


7 


6.5 


7 


6 


S = {vi ， … ， V 4 }. Is S affinely dependent? 


， and let 


SOLUTION Compute V 2 _ v i = 

1 

4 

,y 3 - Vl = 

-1 

1 

,and V 4 — Vi = 

-1 

11 


— .5 


0 


-1 


and row reduce the matrix: 


一 1 

-1 

-1" 


"1 

-1 

-1" 


"1 

-1 

-1" 

4 

1 

11 

〜 

0 

5 

15 

〜 

0 

5 

15 

— .5 

0 

-1 


0 

— .5 

-1.5 


0 

0 

0 


Recall from Section 4.6 (or Section 2.8) that the columns are linearly dependent be¬ 
cause not every column is a pivot column; so V 2 — Vi, V 3 — Vi, and V 4 — Vi are linearly 
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THEOREM 6 


DEFINITION 


dependent. By statement (c) in Theorem 5, {vi, V 2 , V 3 , V 4 } is affinely dependent. This 
dependence can also be established using (d) in Theorem 5 instead of (c). ■ 

The calculations in Example 3 show that V 4 — Vi is a linear combination of \2 — \\ 
and V 3 — Vi, which means that V 4 — Vi is in Span {v 2 — Vi, V 3 — Vi}. By Theorem 1 in 
Section 8.1 ， V 4 is in aff{vi,V 2 , V 3 }. In fact, complete row reduction of the matrix in 
Example 3 would show that 

v 4 - vi = 2(y 2 - vi) + 3(v 3 - Vi) (5) 

y 4 = —4v! + 2 v 2 + 3v 3 ( 6 ) 


See Fig. 3. 



FIGURE 3 V4 is in the plane aff {vi,V2, V3}. 


Figure 3 shows grids on both Span{v 2 — Vi, V 3 — Vi} and aff {vi, V 2 , V 3 }. The grid 
on aff {vi, V 2 , V 3 } is based on (5). Another “coordinate system” can be based on ( 6 ), in 
which the coefficients —4, 2, and 3 are called affine or barycentric coordinates of V 4 . 

Barycentric Coordinates 

The definition of barycentric coordinates depends on the following affine version of the 
Unique Representation Theorem in Section 4.4. See Exercise 17 in this section for the 
proof. 


Let S = {vi,..., v^：} be an affinely independent set in R”. Then each p in aff S 
has a unique representation as an affine combination of Vi,... That is, for 
each p there exists a unique set of scalars C\,... ,Ck such that 

p = c\\i H - h Ck\k and C\ H - \- Ck = \ (7) 


Let S = {vi,..., v^} be an affinely independent set. Then for each point p in 
aff S, the coefficients C\,... ,c p in the unique representation (7) of p are called 
the barycentric (or, sometimes, affine) coordinates of p. 


Observe that (7) is equivalent to the single equation 




+ ••• + & 



⑻ 


involving the homogeneous forms of the points. Row reduction of the augmented matrix 
[vi … \k p] for ( 8 ) produces the barycentric coordinates of p. 
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EXAMPLE 4 Let a 


，b 


0 


，c 


9 


,andp 


.Find the barycen¬ 


tric coordinates of p determined by the affinely independent set {a, b, c}. 

SOLUTION Row reduce the augmented matrix of points in homogeneous form, moving 
the last row of ones to the top to simplify the arithmetic: 


[a b 


P] 


"1 

3 

9 

5" 


"1 

1 

1 

r 

7 

0 

3 

3 

〜 

1 

3 

9 

5 

1 

1 

1 

1 


1 

0 

3 

3 一 





1 

0 

0 

l 

4 




〜 

0 

1 

0 

1 

3 





0 

0 

1 

5 

12 


The coordinates are H and 長 ， so p = + |b + 咅 c. 


■ 


Bary centric coordinates have both physical and geometric interpretations. They 
were originally defined by A. F. Moebius in 1827 for a point p inside a triangular 
region with vertices a, b, and c. He wrote that the bary centric coordinates of p are 
three nonnegative numbers m a , mb, and m c such that p is the center of mass of a system 
consisting of the triangle (with no mass) and masses m a , mb, and m c at the corresponding 
vertices. The masses are uniquely determined by requiring that their sum be 1. This 
view is still useful in physics today. 1 

Figure 4 gives a geometric interpretation to the barycentric coordinates in Example 
4, showing the triangle Aabc and three small triangles Apbc, Aapc, and Aabp. The 
areas of the small triangles are proportional to the barycentric coordinates of p. In fact, 

1 

area(Apbc) = ^ • area(Aabc) 


area(Aapc) = - - area(Aabc) (9) 

area(Aabp) = — - area(Aabc) 


a 



FIGURE 4 p = ra + 5b + tc. Here, r = |, 


The formulas in Fig. 4 are verified in Exercises 21-23. Analogous equalities for 
volumes of tetrahedrons hold for the case when p is a point inside a tetrahedron in M 3 , 
with vertices a, b, c, and d. 


1 See Exercise 29 in Section 1.3. In astronomy, however, “barycentric coordinates” usually refer to ordinary 
M. 3 coordinates of points in what is now called the International Celestial Reference System, a Cartesian 
coordinate system for outer space, with the origin at the center of mass (the bary center) of the solar system. 
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When a point is not inside the triangle (or tetrahedron), some or all of the barycentric 
coordinates will be negative. The case of a triangle is illustrated in Fig. 5, for vertices a, 
b, c, and coordinate values r, s, t, as above. The points on the line through b and c, for 
instance, have r = 0 because they are affine combinations of only b and c. The parallel 
line through a identifies points with r = 1. 



FIGURE 5 Bary centric coordinates 
for points in aff {a, b, c}. 

Barycentric Coordinates in Computer Graphics 

When working with geometric objects in a computer graphics program, a designer may 
use a “wire-frame” approximation to an object at certain key points in the process 
of creating a realistic final image. 2 For instance, if the surface of part of an object 
consists of small flat triangular surfaces, then a graphics program can easily add color, 
lighting, and shading to each small surface when that information is known only at the 
vertices. Barycentric coordinates provide the tool for smoothly interpolating the vertex 
information over the interior of a triangle. The interpolation at a point is simply the 
linear combination of the vertex values using the barycentric coordinates as weights. 

Colors on a computer screen are often described by RGB coordinates. A triple 
(r, g, b) indicates the amount of each color—red, green, and blue—with the parameters 
varying from 0 to 1. For example, pure red is (1,0,0), white is (1,1,1), and black is 
(0,0,0). 



3 


4 


1 


" 3 " 


EXAMPLE 5 Let w = 

1 

5 

,v 2 = 

3 

4 

,v 3 = 

5 

1 

， and p = 

3 

3.5 

.The col 


ors at the vertices Vi, V2, and V3 of a triangle are magenta (1,0,1), light magenta 
(1, .4,1), and purple (.6,0,1) ， respectively. Find the interpolated color at p. See Fig. 6. 



2 The Introductory Example for Chapter 2 shows a wire-frame model of a Boeing 777 airplane, used to 
visualize the flow of air over the surface of the plane. 
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SOLUTION First, find the barycentric coordinates of p. Here is the calculation using 
homogeneous forms of the points, with the first step moving row 4 to row 1: 


[^1 


V2 V3 


卩]〜 


■1 

1 

1 

1 


"1 

0 

0 

•25 _ 

3 

4 

1 

3 


0 

1 

0 

.50 

1 

3 

5 

3 


0 

0 

1 

.25 

5 

4 

1 

3.5 


0 

0 

0 

0 


So p = .25\\ + . 5 v 2 + .25v 3. Use the barycentric coordinates of p to make a linear 
combination of the color data. The RGB values for p are 


1 


1 


•6 


.9 

red 

0 

+ .50 

.4 

+ .25 

0 

= 

•2 

green ■ 

1 


1 


1 


1 

blue 


One of the last steps in preparing a graphics scene for display on a computer screen 
is to remove “hidden surfaces” that should not be visible on the screen. Imagine the 
viewing screen as consisting of, say, a million pixels, and consider a ray or “line of 
sight” from the viewer’s eye through a pixel and into the collection of objects that make 
up the 3D display. The color and other information displayed in the pixel on the screen 
should come from the object that the ray first intersects. See Fig. 7. When the objects in 
the graphics scene are approximated by wire frames with triangular patches, the hidden 
surface problem can be solved using barycentric coordinates. 



FIGURE 7 A ray from the eye through the screen to 
the nearest object. 


The mathematics for finding the ray-triangle intersections can also be used to per¬ 
form extremely realistic shading of objects. Currently, this ray-tracing method is too 
slow for real-time rendering, but recent advances in hardware implementation may 
change that in the future . 3 


EXAMPLE 6 Let 



r 


8 " 


5" 


0 " 


.7' 

Vl = 

i 

-6 

, v 2 = 

1 

-4 

, V 3 = 

11 

-2 

, a = 

0 

10 

， b = 

.4 

-3 


and x(?) = a + for ^ > 0. Find the point where the ray x(t) intersects the plane that 
contains the triangle with vertices Vi, V 2 , and V 3 , Is this point inside the triangle? 


3 See Joshua Fender and Jonathan Rose, “A High-Speed Ray Tracing Engine Built on a Field-Programmable 
System,” in Proc. Int. Confon Field-Programmable Technology, IEEE (2003). (A single processor can 
calculate 600 million ray-triangle intersections per second.) 
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SOLUTION The plane is aff {vi, V2, V3}. A typical point in this plane may be written 
as (1 — C 2 — C 3 )vi + C 2 V 2 + C 3 V 3 for some and C 3 . (The weights in this combination 
sum to 1.) The ray x(^) intersects the plane when C 2 , C 3 , and t satisfy 

(1 - c 2 - c 3 )vi + c 2 \2 + c 3 y 3 = a + rb 

Rearrange this as C 2 (yi — Vi) + q(V 3 _ Vi) + ^(—b) = a — Vi. In matrix form, 


[V 2 - Vi V 3 - Vi 


Cl 

-b] c 3 


=a — Vi 


For the specific points given here, 



" 7 " 


4 " 


"-1" 

V2 - Vi = 

0 

2 

, v 3 - Vi = 

10 

4 

,a-vi = 

-1 

16 


Row reduction of the augmented matrix above produces 


"7 

4 

-.7 

- 1 " 


"1 

0 

0 

.3" 

0 

10 

-.4 

-1 

〜 

0 

1 

0 

.1 

2 

4 

3 

16 


0 

0 

1 

5 


Thus C 2 = .3, C 3 = .1, and t = 5. Therefore, the intersection point is 



0 


.7 


3.5 

x(5) = a + 5b = 

0 

+ 5 

.4 

= 

2.0 


10 


-3 


-5.0 


Also, 

x(5) = (1 - .3 - .l)vi + .3 v 2 + .lv 3 


1 


8 


5 


3.5 

1 

+ .3 

1 

+ .1 

11 

= 

2.0 

-6 


-4 


-2 


-5.0 


The intersection point is inside the triangle because the barycentric weights for x(5) are 
all positive. ■ 


PRACTICE PROBLEMS 


1. Describe a fast way to determine when three points are collinear. 


2. The points Vi = 


"4" 

1 

,v 2 = 

' 1 ' 

0 

,v 3 = 

'5' 

4 

,and V 4 = 

" 1 " 

2 


form an affinely de¬ 


pendent set. Find weights c\,... ,C 4 that produce an affine dependence relation 

CiVi H - h C4V4 = 0, where c\ -\ - h C4 = 0 and not all are zero. [Hint: See 

the end of the proof of Theorem 5.] 
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8.2 EXERCISES 


In Exercises 1-6, determine if the set of points is affinely de¬ 
pendent. (See Practice Problem 2.) If so, construct an affine 
dependence relation for the points. 


b. If Vi, V 2 , V 3 , and V 4 are in R 3 and if the set 
{v 2 — Vi, V 3 — v 1 , V 4 — v 1 } is linearly independent, then 
{v 1 ， … ， V 4 } is affinely independent. 



2 . 




1 


-2 


2 


0 

3. 

2 

, 

-4 

, 

-1 

, 

15 


-1 


8 


11 


-9 



-2 


0 



1 


-2 

4. 

5 


， 

-3 

, 


-2 

, 

7 


3 



7 



-6 


-3 


1 



0 


-1 




0 

5. 

0 

， 

1 

, 

5 

， 


5 


-2 


1 


1 




-3 


1 


0 


2 


3 


6 . 

3 

, 

-1 

, 

5 

, 

5 



1 


-2 


2 


0 



In Exercises 7 and 8 , find the barycentric coordinates of p with 
respect to the affinely independent set of points that precedes it. 


7. 


8 . 


_ r 


"2" 


_ r 


5" 

-1 


1 


2 

P = 

4 

2 


0 


-2 

-2 

i_ 


_ 1 _ 


0_ 


2_ 

0" 


_ r 


_ r 


"-1 " 

1 


1 


4 

P = 

1 

-2 

* 

0 

* 

—6 

-4 

1 


2 


_ 5_ 


0 


c. Given S = {bi,.... b^} in IR n , each p in a.ff S has 
a unique representation as an affine combination of 
bi,..., b^. 


d. When color information is specified at each vertex Vi, ¥ 2 , 
V 3 of a triangle in R 3 , then the color may be interpolated 
at a point p in aff {vi, \ 2 , V 3 } using the barycentric coor¬ 
dinates of p. 

e. If r is a triangle in R 2 and if a point p is on an edge of 
the triangle, then the barycentric coordinates of p (for this 
triangle) are not all positive. 


11. Explain why any set of five or more points in R 3 must be 
affinely dependent. 

12. Show that a set {vi,..., y^} in R n is affinely dependent when 
p >n -\-2. 


13. Use only the definition of affine dependence to show that an 
indexed set {vi, V 2 } in R n is affinely dependent if and only if 
vi = v 2 . 

14. The conditions for affine dependence are stronger than those 
for linear dependence, so an affinely dependent set is auto¬ 
matically linearly dependent. Also, a linearly independent set 
cannot be affinely dependent and therefore must be affinely 
independent. Construct two linearly dependent indexed sets 
Si and S 2 in R 2 such that 5*1 is affinely dependent and S 2 
is affinely independent. In each case, the set should contain 
either one, two, or three nonzero points. 


In Exercises 9 and 10, mark each statement True or False. Justify 
each answer. 

9. a. Ifvi,..., are inR n and if the set {vi — V 2 , v 3 — v 2 ,..., 
\ p — V 2 } is linearly dependent, then {Vi,...,y^} is 
affinely dependent. (Read this carefully.) 

b. If \i,.. .\ p are in W l and if the set of homogeneous 
forms {vi,...,y^} in R n+1 is linearly independent, then 
{Vi,is affinely dependent. 

c. A finite set of points {vi,..., v^} is affinely dependent if 

there exist real numbers Ci ， ... ， not all zero, such that 
c\ H - h Cfc = 1 and c\\\ -\ - h Ck\k = 0 . 

d. If S = {vi,..., y^} is an affinely independent set in R” 
and if p in R n has a negative barycentric coordinate 
determined by S, then p is not in aff S. 

e. If Vi, V 2 , V 3 ,a, and b are in R 3 and if a ray a + fb for 
^ > 0 intersects the triangle with vertices Vi, V 2 , and V 3 , 
then the barycentric coordinates of the intersection point 
are all nonnegative. 

10. a. If {vi,... y p } is an affinely dependent set in R n , then the 
set {vi ， •.. ， y^} in of homogeneous forms may be 
linearly independent. 


15. Let vi = 

{vi,v 2 ,v 3 }. 

a. Show that the set S is affinely independent. 

b. Find the barycentric coordinates of Pi = 



0 


2 

, v 2 = 

4 

,v 3 = 

0 



1 


-2 


1 


1 

P 2 = 

2 

， P 3 = 

1 

， P 4 = 

-1 

,and p 5 = 

1 


with respect to S . 


c. Let T be the triangle with vertices Vi, V 2 , and V 3 . When 
the sides of T are extended, the lines divide M 2 into seven 
regions. See Fig. 8 . Note the signs of the barycentric 
coordinates of the points in each region. For example, p 5 
is inside the triangle T and all its barycentric coordinates 
are positive. Point has coordinates (—,+,+). Its 
third coordinate is positive because is on the ¥3 side 
of the line through Vi and V 2 . Its first coordinate is 
negative because Pj is opposite the Vi side of the line 
through \2 and V 3 . Point p 2 is on the V 2 V 3 edge of T. Its 
coordinates are (0, +， +). Without calculating the actual 
values, determine the signs of the barycentric coordinates 
of points p 6 , p 7 , and p 8 as shown in Fig. 8 . 
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: y 



16. Let vi = 


P2 = 

'5' 

1 

, P 3 


" 2 " 

_3_ 

， P4 = 

'-1 ' 
0 

， Ps = 

'0" 

4 

P 6 = 

' 1 ' 

2 

P7 = 

' 6 " 

4 

,and 5 = {yi,y 2 ,v 3 }. 



a. Show that the set S is affinely independent. 

b. Find the barycentric coordinates of p l5 p 2 , and p 3 with 
respect to S. 

c. On graph paper, sketch the triangle T with vertices Vi, 
V 2 , and ¥ 3 , extend the sides as in Fig. 5, and plot the 
points p 4 , p 5 , p 6 , and p 7 . Without calculating the actual 
values, determine the signs of the barycentric coordinates 
of points p 4 , p 5 , p 6 , and p 7 . 

17. Prove Theorem 6 for an affinely independent set 
S = {vi ， ..., \ic} in R w . [Hint: One method is to mimic 
the proof of Theorem 7 in Section 4.4.] 

18. Let r be a tetrahedron in “standard” position, with three 
edges along the three positive coordinate axes in R 3 , and 
suppose the vertices are aei, Z?e 2 , ce 3 , and 0 , where 
[ei e 3 ] = I 3 . Find formulas for the barycentric co¬ 
ordinates of an arbitrary point p in R 3 . 


V2 : 


V3 


Pi 


19. Let {pp p 2 , p 3 } be an affinely dependent set of points in R” 
and let /: R” — R m be a linear transformation. Show that 
{/(Pi )， /(P2 )， / (P3)} is affinely dependent in R' 

20. Suppose that {p 1? p 2 , p 3 } is an affinely independent set in R” 
and q is an arbitrary point in R n . Show that the translated set 
{Pi + q ， P2 + q ， P3 + q} is also affinely independent. 

In Exercises 21-24, a, b, and c are noncollinear points in M. 2 and 
p is any other point in R 2 . Let Aabc denote the closed triangular 
region determined by a, b, and c, and let Apbc be the region 
determined by p, b, and c. For convenience, assume that a, b, 
and c are arranged so that det [ a b c ] is positive, where a, b, 
and c are the standard homogeneous forms for the points. 

21. Show that the area of Aabc is det [ a b c ]/2. [Hint: 
Consult Sections 3.2 and 3.3, including the Exercises.] 

22. Let p be a point on the line through a and b. Show that 
det [ a b p ] = 0. 

23. Let p be any point in the interior of Aabc, with barycentric 
coordinates (r, s, t), so that 


[a b 



Use Exercise 19 and a fact about determinants (Chapter 3) to 
show that 

r = (area of Apbc) /(area of Aabc) 
s = (area of A ape) /(area of Aabc) 
t = (area of Aabp)/ (area of Aabc) 


24. Take q on the line segment from b to c and consider the line 
through q and a, which maybe written asp = (1 — x)q + xa 
for all real x. Show that, for each x, det[p b c]= 
x • det [ a b c ]. From this and earlier work, conclude 
that the parameter x is the first barycentric coordinate of p. 
However, by construction, the parameter x also determines 
the relative distance between p and q along the segment from 
q to a. (When x = 1, p = a.) When this fact is applied to 
Example 5, it shows that the colors at vertex a and the point q 
are smoothly interpolated as p moves along the line between 
a and q. 


SOLUTIONS TO PRACTICE PROBLEMS 


1. From Example 1, the problem is to determine if the points are affinely dependent. 
Use the method of Example 2 and subtract one point from the other two. If one of 
these two new points is a multiple of the other, the original three points lie on a line. 

2. The proof of Theorem 5 essentially points out that an affine dependence relation 
among points corresponds to a linear dependence relation among the homogeneous 
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forms of the points, using the same weights. So, row reduce: 





"4 

1 

5 

r 


"1 

1 

1 

1 

[Vi \ 2 V3 V 4 ]= 

1 

0 

4 

2 

〜 

4 

1 

5 

1 

1 

1 

1 

1 


1 

0 

4 

2 


"1 

0 

0 

— 

r 





〜 

0 

1 

0 

1.25 






0 

0 

1 

•75 _ 






View this matrix as the coefficient matrix for Ax = 0 with four variables. Then X 4 
is free, x\ = X4, X2 = —1.25x4, and X3 = —.75x4. One solution is X\ = X4 = 4, 
X 2 = —5, and X 3 = —3. A linear dependence among the homogeneous forms is 
4vi — 5 v 2 — 3v 3 + 4v 4 = 0. So 4vi — 5v 2 — 3v 3 + 4y 4 = 0. 

Another solution method is to translate the problem to the origin by subtracting 
Vi from the other points, find a linear dependence relation among the translated 
points, and then rearrange the terms. The amount of arithmetic involved is about 
the same as in the approach shown above. 


8.3 CONVEX COMBINATIONS 

Section 8.1 considered special linear combinations of the form 

C\\\ + c 2 \2 H - h C k \ k , where Ci + c 2 H - c k = l 

This section further restricts the weights to be nonnegative. 


DEFINITION 


A convex combination of points Vi, V 2 ,..., v ^： in W 1 is a linear combination of 
the form 

C1V1 + C 2 \2 H - h Ck\k 

such that Ci + C 2 + • • • + Q ： = 1 and Ci > 0 for all i. The set of all convex 
combinations of points in a set S is called the convex hull of S, denoted by conv S. 


The convex hull of a single point Vi is just the set {vi }， the same as the affine hull. 
In other cases, the convex hull is properly contained in the affine hull. Recall that the 
affine hull of distinct points \\ and V 2 is the line 

y = (1 —，)vi + t\ 2 , with ^ in R 

Because the weights in a convex combination are nonnegative, the points in conv {vi, V 2 } 
may be written as 

y = (1 — t)\\ + t\ 2 , with 0 < ^ < 1 

which is the line segment between Vi and \ 2 , hereafter denoted by \\\ 2 - 

If a set S is affinely independent and if p g aff S, then p G conv S if and only if the 
barycentric coordinates of p are nonnegative. Example 1 shows a special situation in 
which S is much more than just affinely independent. 

EXAMPLE 1 Let 



3" 


—6 


"3" 


"0" 


"-10" 

Vl = 

0 

6 

, v 2 = 

3 

3 

, V 3 = 

6 

0 

, Pi = 

3 

3 

， P2 = 

5 

11 


-3 


0 


3 


0 


-4 
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DEFINITION 


THEOREM 7 


and S = {vi, V 2 , V 3 }. Note that S is an orthogonal set. Determine whether pj is in 
Span iS, aff S, and conv S. Then do the same for p 2 . 

SOLUTION If p! is at least a linear combination of the points in S, then the weights 
are easily found, because S is an orthogonal set. Let W be the subspace spanned by S. 
A calculation as in Section 6.3 shows that the orthogonal projection of pj onto W is pj 
itself: 


projyPi = 


^V 1 + 

vi-vi 


Pi V2 
V2*V2 


V 2 + 


Pl.V 3 

V 3 -V3 


V 3 


18 18 18 
—— Vi H - V2 H - V3 


3 


-6 


3 


0 

0 

1 

3 

1 

6 


3 

6 

+ 3 

3 

+ 3 

0 

— 

3 

-3 


0 


3 


0 


=Pi 


This shows that p l is in Span S. Also, since the coefficients sum to 1, pj is in aff S. In 
fact, pj is in conv S, because the coefficients are also nonnegative. 

For p 2 , a similar calculation shows that proj^ p 2 ^ p 2 . Since proj^ p 2 is the closest 
point in Span S to p 2 , the point p 2 is not in Span S. In particular, p 2 cannot be in aff S 
or conv S. ■ 


Recall that a set S is affine if it contains all lines determined by pairs of points in S. 
When attention is restricted to convex combinations, the appropriate condition involves 
line segments rather than lines. 


A set S is convex if for each p, q g S, the line segment pq is contained in S. 


Intuitively, a set S is convex if every two points in the set can “see” each other 
without the line of sight leaving the set. Figure 1 illustrates this idea. 



Convex 


Not convex 


The next result is analogous to Theorem 2 for affine sets. 


A set S is convex if and only if every convex combination of points of S lies in 
S. That is, S is convex if and only if S = conv S. 


PROOF The argument is similar to the proof of Theorem 2. The only difference is 
in the induction step. When taking a convex combination of A: + 1 points, consider 
y = ci\i H - h c k y k + c^+iv^+i, where c x H - h c k +\ = 1 and 0 < c t < l for 
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THEOREM 8 


THEOREM 9 


all i. If Ck-\-\ = 1, then y = \k-\-\, which belongs to S, and there is nothing further 
to prove. If q_|_i < 1， let f = C\ +••• + 〜 Then t = l — > 0 and 

y = (1 — ^+l)(^—Vi + • • • + ~ y kJ + Ck-\-l^k+\ (1) 

By the induction hypothesis, the point z = (c\/t)\\ + ••• + (ck/t)\k is in *S, since the 
nonnegative coefficients sum to 1. Thus equation (1) displays y as a convex combination 
of two points in S. By the principle of induction, every convex combination of such 
points lies in S. ■ 

Theorem 9 below provides a more geometric characterization of the convex hull 
of a set. It requires a preliminary result on intersections of sets. Recall from Section 
4.1 (Exercise 32) that the intersection of two subspaces is itself a subspace. In fact, the 
intersection of any collection of subspaces is itself a subspace. A similar result holds 
for affine sets and convex sets. 


Let {S a : a g .4} be any collection of convex sets. Then is convex. If 

{T^ : p e B} is any collection of affine sets, then is affine. 

PROOF If p and q are in then p and q are in each S a . Since each S a is convex, 
the line segment between p and q is in S a for all a and hence that segment is contained 
in r\S a - The proof of the affine case is similar. ■ 


For any set S, the convex hull of S is the intersection of all the convex sets that 
contain S. 


PROOF Let T denote the intersection of all the convex sets containing S. Since conv S 
is a convex set containing S, it follows that T C conv S. On the other hand, let C be 
any convex set containing S. Then C contains every convex combination of points 
of C (Theorem 7), and hence also contains every convex combination of points of the 
subset S. That is, conv S C C. Since this is true for every convex set C containing S, 
it is also true for the intersection of them all. That is, conv S C T. ■ 

Theorem 9 shows that conv ^ is in a natural sense the “smallest” convex set con¬ 
taining S. For example, consider a set S that lies inside some large rectangle in R 2 , and 
imagine stretching a rubber band around the outside of S. As the rubber band contracts 
around S, it outlines the boundary of the convex hull of S. Or to use another analogy, 
the convex hull of S fills in all the holes in the inside of S and fills out all the dents in 
the boundary of S. 

EXAMPLE 2 

a. The convex hulls of sets S and T in R 2 are shown below. 



S 


conv S 


T 


conv T 
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I y = X 2 


FIGURE 3 


b. Let S be the set consisting of the standard basis for R 3 , S = {ei, e 2 , 63 }. Then conv S 
is a triangular surface in R 3 , with vertices ei, e 2 , and e〗.See Fig. 2. ■ 


EXAMPLE 3 Let S = j 

S is the union of the origin and 


: x > 0 and y = x 2 |. Show that the convex hull of 
: x> 0 and 3 ； >^ 2 i. See Fig. 3. 


SOLUTION Every point in conv S must lie on a line segment that connects two points 
of S. The dashed line in Fig. 3 indicates that, except for the origin, the positive y- 
axis is not in conv S, because the origin is the only point of S on the j-axis. It may 
seem reasonable that Fig. 3 does show conv S, but how can you be sure that the point 
( 10 - 2 , 10 4 ), for example, is on a line segment from the origin to a point on the curve in 
SI 

Consider any point p in the shaded region of Fig. 3, say 


P 


with a > 0 and b > a 2 


The line through 0 and p has the equation y = {b/a)t for t real. That line intersects 
S where t satisfies {b/a)t = t 2 , that is, when t = b/a. Thus, p is on the line segment 
b/a 


from 0 to 


b 2 /a 2 


,which shows that Fig. 3 is correct. 


■ 


The following theorem is basic in the study of convex sets. It was first proved by 
Constantin Caratheodory in 1907. If p is in the convex hull of S, then, by definition, p 
must be a convex combination of points of S. But the definition makes no stipulation 
as to how many points of S are required to make the combination. Caratheodory’s 
remarkable theorem says that in an «-dimensional space, the number of points of S in 
the convex combination never has to be more than n 


THEOREM 10 


(Caratheodory) If 5 is a nonempty subset of M w , then every point in conv S can 
be expressed as a convex combination of /i + 1 or fewer points of S. 


PROOF Given p in conv S, one may write p = CiVi H - h cjd where V/ e S, 

ci + • • • + q ： = 1， and Ci > 0, for some k and i = 1,..., A:. The goal is to show that 
such an expression exists for p with k <n 

\fk > n -\- 1, then {vi,..., y^；} is affinely dependent, by Exercise 12 in Section 8.2. 
Thus there exist scalars d\,..., d^, not all zero, such that 

k k 

E 

/ =i 


di\i = 0 and di = 0 


Consider the two equations 


CiVi + C 2 \2 H - h Ck\k = p 

and 

d\\\ + d 2 \2 H - h dk\k = 0 

By subtracting an appropriate multiple of the second equation from the first, we now 
eliminate one of the v z terms and obtain a convex combination of fewer than k elements 
of S that is equal to p. 
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Since not all of the di coefficients are zero, we may assume (by reordering sub¬ 
scripts if necessary) that > 0 and that /d^ < Ci jdi for all those i for which di > 0. 
For / = l,... ,k, let bi = Ci — {ck/dk)di. Then bk = 0 and 

Jl bi = J 2 Ci ~^ ： J 2 di = 1 - 0=1 

/ = 1 i = l k i = l 

Furthermore, each bi > 0. Indeed, if d t < 0, then b[ > c, > 0. If d[ > 0, then bi = 
di{Ci/di — Ck/dk) > 0. By construction, 


k-\ 


= J2 biYi = E 卜 




di \i 


E 


Ci\i 


Ck_ 

dk 


〉 ： i = 〉 ： Cj\i = p 


Thus p is now a convex combination of A: — 1 of the points Vi,..., y^. This process 
may be repeated until p is expressed as a convex combination of at most /? + 1 of the 
points of S. ■ 


The following example illustrates the calculations in the proof above. 


EXAMPLE 4 Let 



_r 


'2" 


'5' 


"3' 


"10 ' 

T 

Vl = 

0 

,V 2 = 

_3_ 

,v 3 = 

4 

,v 4 = 

_ 0 _ 

， P = 

5 

_ 2 _ 


and S = {vi, V 2 , V 4 }. Then 

3V1 + \\2 + 5 V 3 + i^V 4 = p (2) 


Use the procedure in the proof of Caratheodory’s Theorem to express p as a convex 
combination of three points of S. 

SOLUTION The set S is affinely dependent. Use the technique of Section 8.2 to obtain 
an affine dependence relation 


—5vi + 4 y 2 — 3v 3 + 4v 4 = 0 (3) 

Next, choose the points \2 and V 4 in (3), whose coefficients are positive. For each 
point, compute the ratio of the quotients in equations (2) and (3). The ratio for V 2 is 
* + 4 = 去 , and that for V 4 is 占 + 4 = 去 . The ratio for V 4 is smaller, so subtract 去 
times equation (3) from equation (2) to eliminate V 4 : 

(i + 盖 ) V 1 + (j - s) V 2 + (5 + 啬 ) V 3 + (H - 盖 ) V 4 = P 

17 _ T 1 4 I 27 _ 7 — 

48 V 1 + 48 V 2 + 48 V 3 = P ■ 

This result cannot, in general, be improved by decreasing the required number of 
points. Indeed, given any three non-collinear points in R 2 , the centroid of the triangle 
formed by them is in the convex hull of all three, but is not in the convex hull of any two. 
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PRACTICE PROBLEMS 



6 


7 


-2 


1 


"3" 


1. Let vi = 

2 

2 

,v 2 = 

1 

5 

,V 3 = 

4 

-1 

， Pi = 

3 

1 

,and p 2 = 

2 

1 

,and let 


S = {vi, V 2 , V 3 }. Determine whether pj and p 2 are in conv S. 


2. Let S be the set of points on the curve y = l/x for x > 0. Explain geometrically 
why conv S consists of all points on and above the curve S. 


8.3 EXERCISES 


1. In R 2 , let S : 


0 


： 0<^ < 1 U' 


>.Describe (or 


sketch) the convex hull of S. 


in R 2 that 


2. Describe the convex hull of the set S of points 

satisfy the given conditions. Justify your answers. (Show 
that an arbitrary point p in 5 belongs to conv S.) 

a. 3; = l/x and x > 1/2 

b. y = sinx 

c. y = x 1 / 2 and x > 0 

3. Consider the points in Exercise 5 in Section 8.1. Which of 
p 1? p 2 , and p 3 are in conv 5? 

4. Consider the points in Exercise 6 in Section 8.1. Which of 
p l9 p 2 , and p 3 are in conv SI 

5. Let 



-1 


0 


1 


1 

Vi = 

-3 

,v 2 = 

-3 

,V3 = 

-1 

， V4 = 

1 


4 


1 


4 


-2 


_ r 


0" 



Pi = 

-1 

， P 2 = 

-2 

, 



2 


2 




and S = {vi, V2, V3, V4}. Determine whether and p 2 are in 
conv S. 


6. Let vi 


2 


0 


-2 


"-1 " 

0 


-2 


1 


2 

-1 

,V2 = 

2 

,V3 = 

0 

Pi = 

_3 

2 


1 


2 


5 







L 2 」 



- 1 - 
_2 


6 " 


"-1 " 


0 


-4 

， and p 4 = 

-2 

P2 = 

1 

4 

， P3 = 

1 

0 


7 

L 4 J 


_-l_ 


4 


,and let S be 


the orthogonal set {vi, V2, v〗}. Determine whether each p z - is 

in Span S, aff S, or conv S. 

a. pi b. p 2 c. p 3 d. p 4 


Exercises 7-10 use the terminology from Section 8.2. 


7. a. Let T 


Pi 


-1 


2 


4 

0 


3 


1 

， P2 = 

"3" 

2 

， P3 


,and let 


and p 4 : 


0 

2 


Find the barycentric coordinates of p l9 p 2 , p 3 , and p 4 with 
respect to T. 


b. Use your answers in part (a) to determine whether each 
of p 1? ..., p 4 in part (a) is inside, outside, or on the edge 
of conv T, a triangular region. 


8 . 


Repeat Exercise 7 for T = 



and 


2 

Pi = 1 ， P2 = 

9. Let S = {vi, V2, V3, V4} be an affinely independent set. Con¬ 
sider the points pj ,... , p 5 whose barycentric coordinates 
with respect to S are given by (2,0,0, —1), (0, ^), 

(|,0, (I， ^), and ( 金， 0, |,0), respectively. De¬ 

termine whether each of p 1? ..., p 5 is inside, outside, or on 
the surface of conv S, a tetrahedron. Are any of these points 
on an edge of conv S ? 

10. Repeat Exercise 9 for the points q L ,..., q 5 whose barycen¬ 
tric coordinates with respect to S are given by ( 去 ， H 士)， 

(I ， _ i ， 0, -) ，（ 0, ！， * ， 0) ，（ 0, - 2,0, 3 )， and (!，！，|, 0), re¬ 
spectively. 


Ps 


and p 4 : 


In Exercises 11 and 12, mark each statement True or False. Justify 
each answer. 


11. a. If y = CiVi + c 2 \2 + C3V3 and Ci + c 2 + c 3 = 1, then y 

is a convex combination of Vi, V2, and V3. 

b. If 5 is a nonempty set, then conv S contains some points 
that are not in S. 

c. If S and T are convex sets, then S \J T is also convex. 

12. a. A set is convex if x, y € 5 implies that the line segment 

between x and y is contained in S. 
b. If S and T are convex sets, then S C\ T is also convex. 
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c. If S is a nonempty subset of M 5 and y G conv S, then 
there exist distinct points Vi,..., V6 in S such that y is 
a convex combination of Vi,... ， V6. 

13. Let 5 be a convex subset of R n and suppose that 

/: — R m is a linear transformation. Prove that the set 

f(S) = {/(x) : x € 5} is a convex subset of R m . 

14. Let f: W 1 ^ R m be a linear transformation and let 
7" be a convex subset of R m . Prove that the set 
5 = {x G R n : /(x) G 7} is a convex subset of W 1 . 


15. Let Vi 



'r 


'r 


" 4 " 


' 4 " 

Vl = 

0 

,V 2 = 

2 

,V 3 = 

2 

,〜= 

0 


and 


p = 

' 2 ' 

1 

.Confirm that 

Pi 

p = 

|vi + |v 2 + |v 3 + |v 4 and Vi - V 2 + v 3 - v 4 = 0. 

Po 


b. Find an example in R 2 to show that equality need not hold 
in part (a). 

20. a. conv (^4 fl 5) C [(conv ^4) fl (conv B)] 

b. Find an example in R 2 to show that equality need not hold 
in part (a). 

21. Let p 0 , p 1? and p 2 be points in R ”， and define 

fo(0 = (1 - OPo + r Pl» fl(0 = (1 — OPl + 卬 2 ， and 

g(0 = (1 — ?)f 0 (^) + for 0 < ^ < 1. For the points 

as shown below, draw a picture that shows f 。（ 臺 )， f i ( 士)， and 

- *P 2 


Use the procedure in the proof of Caratheodory’s Theorem to 
express p as a convex combination of three of the v, ’s. Do 


22. Repeat Exercise 21 for fo (|), fi (|), and g (|). 


this in two ways. 

16. Repeat Exercise 9 for points Vi 
,and p 


V3 : 


V 4 : 


2 


0 , v 2 

given that 


P = T^Vl + + + n V 4 


and 


10vi — 6 v 2 + 7v3 — 11v4 = 0. 

In Exercises 17-20, prove the given statement about subsets A 
and 5 of . A proof for an exercise may use results of earlier 
exercises. 


23. Let g(?) be defined as in Exercise 21. Its graph is called 
a quadratic Bezier curve, and it is used in some computer 
graphics designs. The points p 0 , p 1? and p 2 are called the 
control points for the curve. Compute a formula for g(^) 
that involves only p 0 , p 1? and p 2 . Then show that g(t) is 
in conv{p 0 ,p 1 ,p 2 } for 0 < ? < 1. 

24. Given control points p 0 , p 1? p 2 , and p 3 in R”，let 

for 0 < ^ < 1 be the quadratic Bezier curve from Exercise 
23 determined by p 0 , p l9 and p 2 , and let g 2 (0 be de¬ 
fined similarly for p 1? p 2 , and p 3 . For 0 < / < 1, define 
h(t) = (1 — 0gi(0 + $2(0. Show that the graph of h(t) 
lies in the convex hull of the four control points. This 
curve is called a cubic Bezier curve, and its definition here 


17. If A C B and B is convex, then conv A C B. 

18. If A C B, then conv A C conv B . 

19. a. [(conv A) U (conv B)] C conv (^4 U B) 


is one step in an algorithm for constructing Bezier curves 
(discussed later in Section 8.6). A Bezier curve of degree 
k is determined by 众 + 1 control points, and its graph lies in 
the convex hull of these control points. 


SOLUTIONS TO PRACTICE PROBLEMS 


1. The points Vi, \ 2 , and V 3 are not orthogonal, so compute 


V2 - Vi = 

r 

-1 

,V 3 - Vl = 

'-8' 

2 

Pi - vi = 

"-5" 

1 

,and p 2 — Vi = 

'-3" 

0 


3 


-3 


-1 


-1 


Augment the matrix [y 2 — vi V 3 _ Vi ] with both Pi _ Vi and p 2 — Vi, and row 
reduce: 


"1-8-5 

-3" 


1 

0 

臺 

-1 2 1 
3-3 -1 

0 

-1 

〜 

0 

0 

1 

0 

2 1 

3 2 

o -1. 


The third column shows that p! — Vi = |(v 2 — Vi) + |(v 3 — Vi), which leads to 
Pi = Ovi + |v2 + |v3. Thus p t is in conv S. In fact, pj is in conv {v2, V3}. 
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The last column of the matrix shows that p 2 — Vi is not a linear combination of 
\2 — \\ and V 3 — Vi. Thus p 2 is not an affine combination of Vi, and V 3 , so p 2 
cannot possibly be in conv S. 

An alternative method of solution is to row reduce the augmented matrix of 
homogeneous forms: 


1 

卜 ~ ~ ~ 叫 0 

[vi y 2 V3 Pi P2J 〜 

0 


0 0 
1 0 
0 1 
0 0 


0 0 
5 0 
I 0 

0 1 


2. If p is a point above S, then the line through p with slope —1 will intersect S at two 
points before it reaches the positive x- and 3 ;-axes. 


8.4 HYPERPLANES 


Hyperplanes play a special role in the geometry of W 1 because they divide the space into 
two disjoint pieces, just as a plane separates R 3 into two parts and a line cuts through 
R 2 . The key to working with hyperplanes is to use simple implicit descriptions, rather 
than the explicit or parametric representations of lines and planes used in the earlier 
work with affine sets . 1 

An implicit equation of a line in R 2 has the form ax -\- by = d . An implicit equa¬ 
tion of a plane in M 3 has the form ax + by cz = d. Both equations describe the 
line or plane as the set of all points at which a linear expression (also called a linear 
functional) has a fixed value, d. 


DEFINITION A linear functional on W l is a linear transformation / from W 1 into R. For each 
scalar d in R, the symbol [/: d] denotes the set of all x in M 71 at which the value 
of f is d. That is, 

[f：d] is the set {x G : /(x) = d} 

The zero functional is the transformation such that /(x) = 0 for all x in M 72 . All 
other linear functionals on are said to be nonzero. 


EXAMPLE 1 In R 2 , the line x — 4y = 13 is a hyperplane in R 2 , and it is the set of 
points at which the linear functional /(x, y) = x — Ay has the value 13. That is, the 
line is the set [/: 13]. ■ 

EXAMPLE 2 In R 3 , the plane 5x — + 3z = 21 is a hyperplane, the set of points 

at which the linear functional g(x, y, z) = 5x — 2y + 3z has the value 21. This hyper¬ 
plane is the set [g: 21]. ■ 

If / is a linear functional on W 1 , then the standard matrix of this linear transforma¬ 
tion / is a 1 x 7 ? matrix A, say A = [a\ ai ... a n ]. So 

[/: 0] is the same as {x g : Ax = 0} = Nul A (1) 


1 Parametric representations were introduced in Section 1 . 5 . 
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If / is a nonzero functional, then rank ^4 = 1, and dim Nul^4 = n — 1, by the Rank 
Theorem. 2 Thus, the subspace [/: 0] has dimension n — \ and so is a hyperplane. Also, 
if d is any number in R, then 

[/: d] is the same as {x G : Ax = d} (2) 

Recall from Theorem 6 in Section 1.5 that the set of solutions of ^4x = b is obtained 
by translating the solution set of Ax = 0, using any particular solution p of Ax = b. 
When A is the standard matrix of the transformation /, this theorem says that 

U'-d] = [/： 0] + p for any p in [/: d] (3) 

Thus the sets [fd] are hyperplanes parallel to [/: 0]. See Fig. 1. 


[f-d] 

[/：0] 


FIGURE 1 Parallel hyperplanes, 
with /(p) = d. 

When 4 is a l x n matrix, the equation Ax = d may be written with an inner 
product n*x, using n in W 1 with the same entries as A. Thus, from (2), 

[/: d] is the same as {x g R n : n*x = d} (4) 

Then [/: 0] = {x G : n.x = 0 }， which shows that [/: 0] is the orthogonal comple¬ 
ment of the subspace spanned by n. In the terminology of calculus and geometry for 
M 3 , n is called a normal vector to [/ : 0]. (A “normal” vector in this sense need not 
have unit length.) Also, n is said to be normal to each parallel hyperplane [/: d], even 
though n.x is not zero when d ^ 0. 

Another name for [/: J] is a level set of f, and n is sometimes called the gradient 
of / when /(x) = n.x for each x. 



EXAMPLE 3 


Let n = 



and y = 


.,and let H = {x : n.x = 12}, so H = 

D 


[/: 12], where f(x, y) = 3x -Ay. Thus H is the line 3% + = 12. Find an implicit 

description of the parallel hyperplane (line) H\ = H 

SOLUTION First, find a point p in H\. To do this, find a point in H and add v to it. 

For instance, ^ is in //, so p = 

n.p = —9. This shows that H\ = [f : — 9]. See Fig. 2, which also shows the subspace 
//o = {x ： n*x = 0}. ■ 


1 


0 


1 

-6 

十 

3 


-3 


is in H\. Now, compute 


The next three examples show connections between implicit and explicit descrip¬ 
tions of hyperplanes. Example 4 begins with an implicit form. 


2 See Theorem 14 in Section 2.9 or Theorem 14 in Section 4.6. 
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EXAMPLE 4 In R 2 , give an explicit description of the line x — Ay = 13 in para¬ 
metric vector form. 

SOLUTION This amounts to solving a nonhomogeneous equation Ax = b, where A = 
[1 —4 ] and b is the number 13 in R. Write x = 13 + Ay, where j is a free variable. 
In parametric form, the solution is 


X 


■13 + 4/ 


'13" 


"4" 


y 

— 


= 

0 

+ J 

1 

=p + jq ， y e iR ■ 


Converting an explicit description of a line into implicit form is more involved. The 
basic idea is to construct [/： 0 ] and then find d for [/: d]. 


EXAMPLE 5 Letvi 


_r 

2 

and \2 = 

"6" 

0 

f and a constant d 


,and let L\ be the line through Vi and 


SOLUTION The line L\ is parallel to the translated line Lo through V 2 — Vi and the 
origin. The defining equation for Lq has the form 


[a b] 



or n.x = 0 , 


where 



Since n is orthogonal to the subspace Lq ， which contains V 2 — Vi, compute 


and solve 



6 


1 


5 

V2 - Vi = 

0 


2 

= 

-2 


i a b ] J 2 =0 


(5) 


By inspection, a solution is [a b] = [2 5]. Let f(x, y) = 2x 5y. From (5), 
Lo = [/: 0], and L\ = [f : d] for some d. Since Vi is on line L\, d = f{\\) = 
2(1) + 5(2) = 12. Thus, the equation for L\ is 2x -\- 5y = 12. As a check, note that 
fiyi) = /(6,0) = 2(6) + 5(0) = 12, so V 2 is on Li, too. ■ 



" 1 " 


2 " 


"3" 

EXAMPLE 6 Let vi = 

1 

1 

,v 2 = 

-1 

4 

,and V 3 = 

1 

2 


scription [/ :d] of the plane H\ that passes through Vi, \ 2 , and V 3 . 


Find an implicit de- 

































464 CHAPTER 8 The Geometry of Vector Spaces 


SOLUTION H\ is parallel to a plane Ho through the origin that contains the translated 
points 



r 


"2" 

V2 — Vi = 

-2 

and y 3 — V! = 

0 


3 


1 


Since these two points are linearly independent, Hq = Span{v 2 — Vi, V 3 — V]}. Let 
a 

n = b be the normal to Hq. Then \2 — Vi and V 3 — Vi are each orthogonal to n. 


c 


That is, (y 2 — vi)*n = 0 and (y 3 — y^-n = 0. These two equations form a system 
whose augmented matrix can be row reduced: 


a 


a 



b 

c 

= 0 , [2 0 1 ] 

b 

c 

= 0 ’ 

■1 -2 3 0" 

2 0 10 


[ 1—2 3 ] 


Row operations yield a = (—|)c, b = (|)c, with c free. Set c = 4, for instance. Then 


and Hq = [f: 0], where /(x) = —2x\ + 5x2 + 4^3. 


The parallel hyperplane H\ is [/: d]. To find d, use the fact that Vi is in H\, and 
compute d = f(\\) = /(l, 1,1) = —2(1) + 5(1) + 4(1) = 7. As a check, compute 
/(v 2 ) = /(2, -1,4) = -2(2) + 5(-1) + 4(4) = 16-9 = 7. ■ 


The procedure in Example 6 generalizes to higher dimensions. However, for the 
special case of R 3 , one can also use the cross-product formula to compute n, using a 
symbolic determinant as a mnemonic device: 

n = (v 2 - vi) x (v 3 - vi) 


1 2 i 

-2 0 j 

3 1 k 

-2i + 5j + 4k 


-2 

3 

- 2 ' 

5 

4 



2 

0 


k 


If only the formula for / is needed, the cross-product calculation may be written 
as an ordinary determinant: 



1 2 x\ 


f(xux 2 ,x 3 )= 

— 2 0 X2 

= 


3 1 x 3 


= 

—+ 5x2 + 4^3 



2 1 2 
l X2+ _2 0 X3 


So far, every hyperplane examined has been described as [/: d] for some linear 
functional / and some d in R, or equivalently as {x G : n*x = d} for some n in R 72 . 
The following theorem shows that every hyperplane has these equivalent descriptions. 


THEOREM 11 A subset H of R /z is a hyperplane if and only if H = [f:d] for some nonzero 
linear functional / and some scalar J in M. Thus, if // is a hyperplane, there 
exist a nonzero vector n and a real number d such that H = {x \ n*x = d}. 
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y 



FIGURE 3 

The set S is closed and bounded. 


PROOF Suppose that // is a hyperplane, take p G //, and let Hq = H — p. Then Ho 
is an (n — 1)-dimensional subspace. Next, take any point y that is not in Hq. By the 
Orthogonal Decomposition Theorem in Section 6.3, 

y = yi 

where yi is a vector in Ho and n is orthogonal to every vector in Ho. The function / 
defined by 

f (x) = n.x for x eR n 

is a linear functional, by properties of the inner product. Now, [/ : 0] is a hyperplane 
that contains Hq ，by construction of n. It follows that 

饰= [/:0] 

[Argument: Hq contains a basis S of n — \ vectors, and since S is in the {n — 1)- 
dimensional subspace [/: 0], S must also be a basis for [/: 0], by the Basis Theorem.] 
Finally, let J = / (p) = n*p. Then, as in (3) shown earlier, 

[f ■ d] = [f :0] p = Ho p = H 

The converse statement that [/: J] is a hyperplane follows from (1) and (3) above. ■ 


Many important applications of hyperplanes depend on the possibility of “separat¬ 
ing” two sets by a hyperplane. Intuitively, this means that one of the sets is on one side 
of the hyperplane and the other set is on the other side. The following terminology and 
notation will help to make this idea more precise. 


TOPOLOGY IN R n : TERMS AND FACTS 

For any point p in and any real 5 > 0, the open ball 5(p, S) with center p and 
radius S is given by 

B(j?, S) = {x: ||x-p|| < 5} 

Given a set S in R w , a point p is an interior point of S if there exists a 5 > 0 
such that S) C S. If every open ball centered at p intersects both S and the 
complement of S, then p is called a boundary point of S. A set is open if it 
contains none of its boundary points. (This is equivalent to saying that all of its 
points are interior points.) A set is closed if it contains all of its boundary points. 
(If S contains some but not all of its boundary points, then S is neither open nor 
closed.) A set S is bounded if there exists a 5 > 0 such that S C 5(0, S). A set 
in is compact if it is closed and bounded. 

Theorem: The convex hull of an open set is open, and the convex hull of a 
compact set is compact. (The convex hull of a closed set need not be closed. See 
Exercise 27.) 


EXAMPLE 7 Let 


( 

-2 


-2 


2 


2 

) 

「 -1] 


「 2] 

) 

I 

2 

' 

-2 


-2 

' 

2 

1 ， Pi = 

0_ 

， and p 2 = 

1 


as shown in Fig. 3. Then pj is an interior point since 5(p, |) C S. The point p 2 
is a boundary point since every open ball centered at p 2 intersects both S and the 
complement of S. The set S is closed since it contains all its boundary points. The 
set S is bounded since S C 5(0,3). Thus S is also compact. ■ 




















466 CHAPTER 8 The Geometry of Vector Spaces 


DEFINITION 


THEOREM 12 


THEOREM 13 


Notation: If / is a linear functional, then f(A) < d means /(x) < d for each x G A. 
Corresponding notations will be used when the inequalities are reversed or when they 
are strict. 

The hyperplane H = [f : d] separates two sets A and B if one of the following 
holds: 

(i) f(A) < d and f(B) > d, or 

(ii) f(A) > d and f(B) < d. 

If in the conditions above all the weak inequalities are replaced by strict inequal¬ 
ities, then H is said to strictly separate A and B. 


Notice that strict separation requires that the two sets be disjoint, while mere sep¬ 
aration does not. Indeed, if two circles in the plane are externally tangent, then their 
common tangent line separates them (but does not separate them strictly). 

Although it is necessary that two sets be disjoint in order to strictly separate them, 
this condition is not sufficient, even for closed convex sets. For example, let 


A 


: x > - and — < 

_ 2 X _ 


< 2} and B 


:x >0 and j = 0> 


Then A and B are disjoint closed convex sets, but they cannot be strictly separated 
by a hyperplane (line in R 2 ). See Fig. 4. Thus the problem of separating (or strictly 


separating) two sets by a hyperplane is more complex than it might at first appear. 

: y 





_ \ 


— - 


1 

1 

2 

1 1 1 1 

4 


FIGURE 4 Disjoint closed convex sets. 


There are many interesting conditions on the sets A and B that imply the existence 
of a separating hyperplane, but the following two theorems are sufficient for this section. 
The proof of the first theorem requires quite a bit of preliminary material, 3 but the second 
theorem follows easily from the first. 


Suppose A and B are nonempty convex sets such that A is compact and B is 
closed. Then there exists a hyperplane H that strictly separates A and B if and 
only if 乂 D B = 0. 


Suppose A and B are nonempty compact sets. Then there exists a hyperplane that 
strictly separates A and B if and only if (conv A) fi (conv B) = 0. 


3 A proof of Theorem 12 is given in Steven R. Lay, Convex Sets and Their Applications (New York: John 
Wiley & Sons, 1982; Mineola, NY: Dover Publications, 2007), pp. 34-39. 
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PROOF Suppose that (conv ^4) fl (conv B) = 0. Since the convex hull of a compact 
set is compact, Theorem 12 ensures that there is a hyperplane H that strictly separates 
conv A and conv B. Clearly, H also strictly separates the smaller sets A and B. 

Conversely, suppose the hyperplane H = [f : d] strictly separates A and B. With¬ 
out loss of generality, assume that f(A) < d and f (B) > d • Letx = ciXi + • • • + CkXk 
be any convex combination of elements of A. Then 

/(x) = Ci/(x0 + ... + c k f(x k ) < c x d + ■■■ + c k d = d 

since ci + ••• + q ： = 1. Thus /(conv A) < d. Likewise, /(conv B) > d, so H = 
[f : d] strictly separates conv A and conv B. By Theorem 12, conv A and conv B must 
be disjoint. ■ 

EXAMPLE 8 Let 



"2" 


"-3" 


"3" 


_r 


2" 

ai = 

1 

1 

,a 2 = 

2 

1 

,a 3 = 

4 

0 

,bi = 

0 

2 

,and b 2 = 

-1 

5 


and let ^4 = {ai ， a 2 , a〗} and B = {bi, b 2 }. Show that the hyperplane H = [f : 5], where 
f(x\,X 2 , x^) = 2x\ — 3x2 + ^ 3 , does not separate A and B. Is there a hyperplane 
parallel to H that does separate A and B1 Do the convex hulls of A and B intersect? 

SOLUTION Evaluate the linear functional / at each of the points in A and B: 

/(ai) = 2, /(a 2 ) = -ll, /(a 3 ) = -6, /(bO = 4, and /(b 2 ) = 12 

Since /(bi) = 4 is less than 5 and / (b 2 ) = 12 is greater than 5, points of B lie on both 
sides of // = [/: 5] and so H does not separate A and B. 

Since f(A) < 3 and f(B) > 3, the parallel hyperplane [/: 3] strictly separates A 
and B. By Theorem 13, (conv 乂 ) fl (conv B) = 0. 

Caution: If there were no hyperplane parallel to H that strictly separated A and B, 
this would not necessarily imply that their convex hulls intersect. It might be that some 
other hyperplane not parallel to H would strictly separate them. ■ 


PRACTICE PROBLEM 



1 


-1 


1 


-2 


Let pj = 

0 

， P 2 = 

2 

,ni = 

1 

,n 2 = 

1 

， let H\ be the hyperplane 


2 


1 


-2 


3 



(plane) in R 3 passing through the point pj and having normal vector ni, and let H 2 
be the hyperplane passing through the point p 2 and having normal vector 112 . Give an 
explicit description of H\ fl i /2 by a formula that shows how to generate all points in 

A n h 2 . 


8.4 EXERCISES 

「 一 11 r 3 " 

1. Let L be the line in M. 2 through the points ^ and ^ . 

Find a linear functional / and a real number d such that 
L = [f ： d]. 


2 . 


Let L be the line in R 2 through the points 


and 



Find a linear functional / and a real number d such that 
L = [f ： d]. 
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In Exercises 3 and 4, determine whether each set is open or closed 
or neither open nor closed. 

3. a. {(x ， j) : y > 0} 

b. {(X ， y) : x = 2 and 1 < j < 3} 

c. {(x ， y) : x = 2 and 1 < 3 ; < 3} 

d. {(X ， y) \ xy = \ and x > 0} 

e. {(x, j) : xy > 1 and x > 0 } 

4. a. {(x, y) : x 2 y 2 = 1} 

b. {(x, y) : x 2 + y 2 > 1} 

c. {(x ， y) \ X 1 + y 1 <\ and y > 0} 

d. {(x ， y )： y > x 2 } 

e. {(X ， y):y < x 2 } 

In Exercises 5 and 6 , determine whether or not each set is compact 
and whether or not it is convex. 

5. Use the sets from Exercise 3. 

6 . Use the sets from Exercise 4. 


In Exercises 7-10, let H be the hyperplane through the listed 
points, (a) Find a vector n that is normal to the hyperplane, 
(b) Find a linear functional / and a real number d such that 
H = [f'd]. 



1 


2 


-1 


1 


4 


7 

7. 

1 

, 

4 

, 

-2 

8. 

-2 

, 

-2 

, 

-4 


3 


1 


5 


1 


3 


4 


12 11 



0 0 0 1 




normal n and passing through p. Which of the points Vi, V 2 , 
and V 3 are on the same side of H as the origin, and which are 
not? 



2 


3 


-1 


0 

12. Let aj = 

-1 

5 

,a 2 = 

1 

3 

,a 3 = 

6 

0 

,bi = 

5 

-1 



"r 


"2" 


3" 


b 2 = 

-3 

-2 

,b 3 = 

2 

1 

,and n = 

1 

-2 

,and let 


A = {ai, a 2 , a 3 } and B = {bi, b 2 , b 3 }. Find a hyperplane H 


with normal n that separates A and B. Is there a hyperplane 
parallel to H that strictly separates A and B1 


13. Let 



2" 


r 


" 1' 


-3 


2 


2 

Pi = 

1 

， P 2 = 

-1 

,n!= 

4 


2_ 


3 


2 


2 


and 


112 = , let H\ be the hyperplane in R 4 through pj with 


_5_ 

normal n!, and let H 2 be the hyperplane through p 2 with 
normal 112 . Give an explicit description of H\ D H 2 . [Hint: 
Find a point p in //1 fl H 2 and two linearly independent 
vectors Vi and \2 that span a subspace parallel to the 2- 
dimensional flat //1 D H 2 .] 


14. Let F\ and F 2 be 4-dimensional flats in R 6 , and suppose that 

fl 巧 _ 0. What are the possible dimensions of _Fi fl F 2 ? 

In Exercises 15-20, write a formula for a linear functional / and 
specify a number d, so that [f: d] is the hyperplane H described 
in the exercise. 

15. Let A be the 1x4 matrix [1—3 4 — 2]and let b = 5. Let 
H = {xinR 4 : Ax = b}. 

16. Let A be the 1x5 matrix [2 5 —3 0 6]. Note that 
Nul A is in R 5 . Let H = Nul A. 


17. Let H be the plane in R 3 spanned by the rows of B = 

^ 2 ^ . That is, H = Row B. [Hint: How is // 

related to Nul B1 See Section 6.1.] 

18. Let H be the plane in R 3 spanned by the rows of B = 

1 4 -5' 


0 -2 


8 


.That is, H = Row B. 


19. Let H be the column space of the matrix B 


1 0 

4 2 

-7 -6_ 

That is, H = Col B. [Hint: How is Col B related to Nul B T 1 
See Section 6.1.] 

' 0 " 

20. Let H be the column space of the matrix B 


That is, H =Co\B. 


In Exercises 21 and 22, mark each statement True or False. Justify 
each answer. 


21. a. A linear transformation from R to is called a linear 
functional. 

b. If / is a linear functional defined on R n , then there exists 
a real number k such that / (x) = kx for all x in R”. 

c. If a hyperplane strictly separates sets A and B, then 
An B = 0 . 

d. If A and B are closed convex sets and A C\ B = 0, then 
there exists a hyperplane that strictly separates A and B. 
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_ 10 


5 

3 

Thus x\ = —咢 + |^ 3 , X 2 = 金 + X 3 = X 3 . Letp = 

1 

3 

and v = 

1 

3 


0 


1 


general solution can be written as x = p + X 3 V. Thus i/i fl H 2 is the line through p in 
the direction of v. Note that v is orthogonal to both n! and 112 . 


8.5 POLYTOPES 

This section studies geometric properties of an important class of compact convex sets 
called polytopes. These sets arise in all sorts of applications, including game theory 
(Section 9.1), linear programming (Sections 9.2 to 9.4), and more general optimization 
problems, such as the design of feedback controls for engineering systems. 


a. If J is a real number and / is a nonzero linear functional 
defined on M”，then [/: d] is a hyperplane in 

b. Given any vector n and any real number d, the set 
{x : n*x = d} is a hyperplane. 

c. If A and B are nonempty disjoint sets such that A is 
compact and B is closed, then there exists a hyperplane 
that strictly separates A and B. 

d. If there exists a hyperplane H such that H does not 
strictly separate two sets A and B, then (conv A) D 
(conv B) ^ 0. 


Let Vi 


V2 : 


0 


V3 : 


5 


,and p : 


4 


.Find 


a hyperplane [/: d] (in this case, a line) that strictly separates 
p from conv {vi, y 2 , v 3 }. 


Repeat Exercise 23 for Vi 
' 2 ' 


and p : 


_ r 


'5' 


■4_ 

2 

,V 2 = 

1 

,v 3 = 

4 


4 . . 

25. Let p = 1 • Find a hyperplane [/: d] that strictly sepa¬ 

rates 5(0,3) and 5(p, 1). [Hint: After finding /, show that 
the point v = (1 — .75)0 + ,75p is neither in 5(0,3) nor in 
B(p, 1 ).] 


26. Let q 


2 


and p : 


6 


.Find a hyperplane [/: d] that 


strictly separates B(q, 3) and B(p, 1). 

27. Give an example of a closed subset S of M 2 such that conv S 
is not closed. 

2$. Give an example of a compact set A and a closed set B in R 2 
such that (conv ^4) fl (conv B) = 0 but A and B cannot be 
strictly separated by a hyperplane. 

29. Prove that the open ball B(p, 5) = {x : ||x — p|| < 5} is a 
convex set. [Hint: Use the Triangle Inequality.] 

30. Prove that the convex hull of a bounded set is bounded. 


SOLUTION TO PRACTICE PROBLEM 


First, compute ni • pj = —3 and 112 • p 2 = 7. The hyperplane H\ is the solution set of the 
equation x\ -\- X 2 — 2x^> = —3, and H 2 is the solution set of the equation —2x\ + X 2 + 
3 jv ：3 = 7. Then 

//1 fl i /2 = {x : xi + X 2 — 2%3 = —3 and —2x\ + X 2 + 3 x 3 = 7} 


This is an implicit description of H\ fl H 2 . To find an explicit description, solve the 
system of equations by row reduction: 


1013 


o 


2 . 


2324 
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DEFINITION 


DEFINITION 


A polytope in W l is the convex hull of a finite set of points. In M 2 , a polytope 
is simply a polygon. In R 3 , a polytope is called a polyhedron. Important features of 
a polyhedron are its faces, edges, and vertices. For example, the cube has 6 square 
faces, 12 edges, and 8 vertices. The following definitions provide terminology for 
higher dimensions as well as M 2 and R 3 . Recall that the dimension of a set in is 
the dimension of the smallest flat that contains it. Also, note that a polytope is a special 
type of compact convex set, because a finite set in is compact and the convex hull of 
this set is compact, by the theorem in the topology terms and facts box in Section 8.4. 


Let be a compact convex subset of R w . A nonempty subset F of 5 is called 
a (proper) face of S if F ^ S and there exists a hyperplane H = [f : d] such 
that F = S H H and either f(S) < d or /(S) > d. The hyperplane H is called 
a supporting hyperplane to S. If the dimension of F is k, then F is called a 
A：-face of S. 

If P is a polytope of dimension k, then P is called a 众 -polytope. A 0-face 
of P is called a vertex (plural: vertices), a 1-face is an edge, and a (k — 1)- 
dimensional face is a facet of S. 


EXAMPLE 1 Suppose S is a cube in R 3 . When a plane H is translated through 
M 3 until it just touches (supports) the cube but does not cut through the interior of the 
cube, there are three possibilities for H (1 S ，depending on the orientation of H. (See 
Figure 1.) 


H C\ S may be a 2-dimensional square face (facet) of the cube. 

H H S may be a 1-dimensional edge of the cube. 

H H S may be a O-dimensional vertex of the cube. ■ 


H 



H C\ S is 2-dimensional. 



FIGURE 1 



H C\ S is O-dimensional. 


Most applications of poly topes involve the vertices in some way, because they have 
a special property that is identified in the following definition. 


Let 5 1 be a convex set. A point p in 5 is called an extreme point of S if p is 
not in the interior of any line segment that lies in S. More precisely, if x, y G S 
and p G ~xy, then p = x or p = y. The set of all extreme points of S is called the 

profile of S. 
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A vertex of any compact convex set S is automatically an extreme point of S. This 
fact is proved during the proof of Theorem 14, below. In working with a poly tope, say 
P = conv {vi,..., y^：} for vi,..., ya ； in W l , it is usually helpful to know that Vi,.. • ， va ： 
are the extreme points of P. However, such a list might contain extraneous points. For 
example, some vector y z could be the midpoint of an edge of the poly tope. Of course, 
in this case y z - is not really needed to generate the convex hull. The following definition 
describes the property of the vertices that will make them all extreme points. 


The set {vi, ..., y^} is a minimal representation of the polytope P if P = 
conv {vi,, va ：} and for each i = 1,..., /:,v/ ^ conv {v 7 - : j ^ i). 


Every polytope has a minimal representation. For if P = conv {vi,..., y^；} and if 
some V/ is a convex combination of the other points, then y/ may be deleted from the 
set of points without changing the convex hull. This process may be repeated until the 
minimal representation is left. It can be shown that the minimal representation is unique. 


THEOREM 14 Suppose M = {vi,..., y^；} is the minimal representation of the poly tope P. Then 
the following three statements are equivalent: 

a. p G M. 

b. p is a vertex of P. 

c. p is an extreme point of P. 


H H' 


P 



FIGURE 2 


PROOF (a) (b) Suppose p g M and let Q = conv {v : \ e M and v ^ p}. It fol¬ 

lows from the definition of M that p ^ Q, and since Q is compact, Theorem 13 implies 
the existence of a hyperplane H r that strictly separates {p} and Q. Let H be the 
hyperplane through p parallel to H'. See Fig. 2. 

Then Q lies in one of the closed half-spaces H+ bounded by H and so P c // + . 
Thus H supports 尸 at p. Furthermore, p is the only point of P that can lie on //, so 
// D 尸 ={p} and p is a vertex of P . 

(b) => (c) Let p be a vertex of P. Then there exists a hyperplane H = [f : d] such 
that H (1 P = {p} and f(P) > d. If p were not an extreme point, then there would 
exist points x and y in P such that p = (1 — c)x + cy with 0 < c < 1. That is, 

cy = p - (1 -c)x and y =( 去 ) (p) _ ( 去 -1)(x) 

It follows that /(y) = -/(p) — ( - — l)/(x). But f(p) = d and /(x) > d, so 

/( y ) < ⑴⑷ _ (卜 1 ) (^) = ^ 

On the other hand, y e 尸 ， so /(y) > d. It follows that f(y) = d and that y e // fl 尸 . 
This contradicts the fact that p is a vertex. So p must be an extreme point. (Note that 
this part of the proof does not depend on P being a poly tope. It holds for any compact 
convex set.) 

(c) => (a) It is clear that any extreme point of P must be a member of M . ■ 
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EXAMPLE 2 Recall that the profile of a set S is the set of extreme points of S. 
Theorem 14 shows that the profile of a polygon in R 2 is the set of vertices. (See Fig. 3.) 
The profile of a closed ball is its boundary. An open set has no extreme points, so its 
profile is empty. A closed half-space has no extreme points, so its profile is empty. ■ 



FIGURE 3 


Exercise 18 asks you to show that a point p in a convex set S is an extreme point 
of S if and only if, when p is removed from S, the remaining points still form a convex 
set. It follows that if S* is any subset of S such that conv S* is equal to S, then S* must 
contain the profile of S. The sets in Example 2 show that in general S* may have to be 
larger than the profile of S. It is true, however, that when S is compact we may actually 
take S* to be the profile of S, as Theorem 15 will show. Thus every nonempty compact 
set S has an extreme point, and the set of all extreme points is the smallest subset of S 
whose convex hull is equal to S. 


THEOREM 15 Let 5 be a nonempty compact convex set. Then S is the convex hull of its profile 
(the set of extreme points of S). 

PROOF The proof is by induction on the dimension of the set S. 1 ■ 

One important application of Theorem 15 is the following theorem. It is one of the 
key theoretical results in the development of linear programming. Linear functionals 
are continuous, and continuous functions always attain their maximum and minimum 
on a compact set. The significance of Theorem 16 is that for compact convex sets, the 
maximum (and minimum) is actually attained at an extreme point of S. 

THEOREM 16 Let / be a linear functional defined on a nonempty compact convex set S. Then 
there exist extreme points y and w of S such that 

/(v) = max /(v) and /(w) = min /(v) 


PROOF Assume that / attains its maximum m on at some point V in S. That is, 
f{y r ) = m. We wish to show that there exists an extreme point in S with the same 
property. By Theorem 15, \ r is a convex combination of the extreme points of S. That 
is, there exist extreme points Vi,..., v ^： of 5 and nonnegative C\,... ,Ck such that 

v’ = c\\i H - h Ck\k with c\ H - h = 1 

If none of the extreme points of S satisfies /(y) = m, then 

/ (y ； ) <m for / = l,... ,k 


1 The details may be found in Steven R. Lay, Convex Sets and Their Applications (New York: John Wiley & 
Sons, 1982; Mineola, NY: Dover Publications, 2007, p. 43. 
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since m is the maximum of / on S. But then, because / is linear, 
m = f(V) = /(civi H - + c k \ k ) 

=+ ••• + C k f{\ k ) 

< c\m H - h cjjn = m{c\ + - \- c^) = m 

This contradiction implies that some extreme point \ of S must satisfy / (y) = m. 

The proof for w is similar. ■ 


EXAMPLE 3 


Given points P!= 



,and p 3 


in R 2 , let S 


conv{pj, p 2 , p 3 }. For each linear functional /, find the maximum value m of / on the 


set S, and find all points x in S at which /(x) = m. 

a. f\(xi,x 2 ) = x\ x 2 b. / 2 (xi,x 2 ) = -3x\ + x 2 c. / 3 (xi,x 2 ) = x\ + 2x 2 


SOLUTION By Theorem 16, the maximum value is attained at one of the extreme points 

of S. So to find m, evaluate / at each extreme point and select the largest value. 

a. /i(Pi) = — 1, /i(p 2 ) = 4, and f\ (p 3 ) = 3, so mi =4. Graph the line f\(x\,X 2 )= 
m\, that is, X\ X 2 = 4, and note that x = p 2 is the only point in S at which f\ (x)= 
4. See Fig. 4(a). 

b. /2(Pi) = 3, / 2 (p 2 ) = -8, and / 2 (p 3 ) = -l,somi = 3. Graph the line / 2 (xi,x 2 )= 
m 2 , that is, —3x\ + X 2 = 3, and note that x = pj is the only point in S at which 
/ 2 (x) = 3. See Fig. 4(b). 

C. / 3 (Pi) = -1, / 3 (p 2 ) = 5, and / 3 (p 3 ) = 5, so mi = 5. Graph the line f^{xx,x 2 )= 
m 3 , that is, X\ + 2 x 2 = 5. Here, attains its maximum value at p 2 , at p 3 , and at 
every point in the convex hull of p 2 and p 3 . See Fig. 4(c). ■ 





FIGURE 4 


^2 



4 - 




J- 

P3 




_ s 

^P2 

-2 

Pi 

2 

4 


(b) -3^! + x 2 = 3 


^2 


4 - 


2 - 

P3 


- s p 2 

-2 Pi 

2 4 


(c) +2^2=5 


The situation illustrated in Example 3 for R 2 also applies in higher dimensions. The 
maximum value of a linear functional / on a polytope P occurs at the intersection of 
a supporting hyperplane and P. This intersection is either a single extreme point of P , 
or the convex hull of 2 or more extreme points of 尸 .In either case, the intersection is 
a poly tope, and its extreme points form a subset of the extreme points of P. 

By definition, a polytope is the convex hull of a finite set of points. This is an 
explicit representation of the polytope since it identifies points in the set. A polytope 
may also be represented implicitly as the intersection of a finite number of closed half¬ 
spaces. Example 4 illustrates this in R 2 . 
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EXAMPLE 4 Let 



"0" 


T 

Pi = 

1 

， P 2 = 

0 


and 


P3 = 



in M 2 , and let S =conv {pj, p 2 , p 3 }. Simple algebra shows that the line through pj and 
p 2 is given by Xi + X 2 = 1， and S is on the side of this line where 

Xi + ^2 > 1 or, equivalently, — x\ — X 2 < —1. 

Similarly, the line through p 2 and p 3 is Xi — X 2 = 1, and S is on the side where 


xi-x 2 < I 


Also, the line through p 3 and p l is —X\ + 3x2 = 3, and S is on the side where 

—Xi + 3x2 S 3. 

See Figure 5. It follows that S can be described as the solution set of the system of 
linear inequalities 

-x 2 <-\ 

X\-X 2 < I 
—Xi + 3x2 ^ 3 

This system may be written as Ax < b, where 


A = 

"-1 -1" 
1 -1 

,x = 

■ 

, and b = 

"-1' 

1 


-1 3 


_^2_ 


3 


Note that an inequality between two vectors, such as Ax and b, applies to each of the 
corresponding coordinates in those vectors. ■ 



In Chapter 9, it will be necessary to replace an implicit description of a polytope by 
a minimal representation of the poly tope, listing all the extreme points of the poly tope. 
In simple cases, a graphical solution is feasible. The following example shows how to 
handle the situation when several points of interest are too close to identify easily on a 
graph. 

EXAMPLE 5 Let P be the set of points in R 2 that satisfy Ax < b, where 

一 1 3 一 

A = 1 1 and b = 

_3 2 一 

and x > 0. Find the minimal representation of P. 


18 

8 

21 
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SOLUTION The condition x > 0 places P in the first quadrant of R 2 , a typical condi¬ 
tion in linear programming problems. The three inequalities in Ax < b involve three 
boundary lines: 


(1) xi + 3x2 = 18 (2) xi + X 2 = 8 (3) 3xi + 2x2 = 21 

All three lines have negative slopes, so a general idea of the shape of P is easy to 
visualize. Even a rough sketch of the graphs of these lines will reveal that (0,0) ， (7,0 )， 
and (0, 6) are vertices of the polytope P. 

What about the intersections of the lines (1) ，（ 2)，and (3)? Sometimes it is clear 
from the graph which intersections to include. But if not, then the following algebraic 
procedure will work well: 

When an intersection point is found that corresponds to two inequalities, test it 
in the other inequalities to see whether the point is in the poly tope. 

The intersection of (1) and (2) is p 12 = (3, 5). Both coordinates are nonnegative, 
so p 12 satisfies all inequalities except possibly the third inequality. Test this: 

3(3) + 2(5) = 19 < 21 

This intersection point satisfies the inequality for (3)，so p 12 is in the poly tope. 

The intersection of (2) and (3) is p 23 = (5, 3). This satisfies all inequalities except 
possibly the inequality for (1). Test this: 

1(5)+ 3(3) = 14 < 18 

This shows that p 23 is in the poly tope. 

Finally, the intersection of (1) and (3) is p 13 = (y, y). Test this in the inequality 
for ⑵： 

1(f) + 1 (f) = f 〜 8.6 > 8 

Thus p 13 does not satisfy the second inequality, which shows that p 13 is not in P. In 
conclusion, the minimal representation of the polytope P is 


j 

0 


7 


3 


5 


0 

i 

0 

5 

0 

' 

5 


3 

* 

6 


The remainder of this section discusses the construction of two basic polytopes 
in R 3 (and higher dimensions). The first appears in linear programming problems, 
the subject of Chapter 9. Both poly topes provide opportunities to visualize R 4 in a 
remarkable way. 


Simplex 

A simplex is the convex hull of an affinely independent finite set of vectors. To construct 
a /c-dimensional simplex (or A:-simplex), proceed as follows: 

0-simplex S°: a single point {vi} 

1- simplex S l : conv(5° U {V 2 })，with \2 not in aff S° 

2- simplex S 2 : conv(5 1 U {y 3 }), with y 3 not in aff S 1 


^-simplex S k : con\(S k ~ 1 U {va ： +i}), with y^+i not in aff S k 一 1 

The simplex S 1 is a line segment. The triangle S 2 comes from choosing a point V 3 
that is not in the line containing and then forming the convex hull with S 2 . (See 
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FIGURE 6 


Fig. 6 .) The tetrahedron ^3 is produced by choosing a point V 4 not in the plane of S 2 
and then forming the convex hull with S 2 . 

Before continuing, consider some of the patterns that are appearing. The triangle 
S 2 has three edges. Each of these edges is a line segment like S l . Where do these 
three line segments come from? One of them is S l . One of them comes by joining the 
endpoint V2 to the new point V3. The third comes from joining the other endpoint Vi to 
V 3 . You might say that each endpoint in S 1 is stretched out into a line segment in S 2 . 

The tetrahedron S 3 in Fig. 6 has four triangular faces. One of these is the original 
triangle S 2 , and the other three come from stretching the edges of S 2 out to the new 
point V 4 . Notice too that the vertices of S 2 get stretched out into edges in S 3 . The 
other edges in S 3 come from the edges in S 2 . This suggests how to “visualize” the 
four-dimensional S 4 . 

The construction of S 4 , called a pentatope, involves forming the convex hull of S 3 
with a point V 5 not in the 3-space of S 3 . A complete picture is impossible, of course, 
but Fig. 7 is suggestive: S 4 has five vertices, and any four of the vertices determine a 
facet in the shape of a tetrahedron. For example, the figure emphasizes the facet with 
vertices Vi, V 2 , V 4 , and vs and the facet with vertices \ 2 , V 3 , V 4 , and V 5 . There are five 


V 5 



FIGURE 7 The 4-dimensional simplex S 4 projected onto R 2 , with two 
tetrahedral facets emphasized. 
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such facets. Figure 7 identifies all ten edges of S 4 , and these can be used to visualize 
the ten triangular faces. 

Figure 8 shows another representation of the 4-dimensional simplex S 4 . This time 
the fifth vertex appears “inside” the tetrahedron S 3 . The highlighted tetrahedral facets 
also appear to be “inside” S 3 . 



FIGURE 8 The fifth vertex of 5 4 is “inside” S 3 . 


Hypercube 

Let It = 0e z be the line segment from the origin 0 to the standard basis vector e/ in W 1 . 
Then for k such that 1 < k < n, the vector sum 2 


C = I\ + ,2 + ••• + /々 


is called a ^-dimensional hypercube. 

To visualize the construction of C k , start with the simple cases. The hypercube C 1 
is the line segment I\. If C 1 is translated by e 2 , the convex hull of its initial and final 
positions describes a square C 2 . (See Fig. 9 on page 478.) Translating C 2 by e 3 creates 
the cube C 3 . A similar translation of C 3 by the vector yields the 4-dimensional 
hypercube C 4 . 

Again, this is hard to visualize, but Fig. 10 shows a 2-dimensional projection of C 4 . 
Each of the edges of C 3 is stretched into a square face of C 4 . And each of the square 
faces of C 3 is stretched into a cubic face of C 4 . Figure 11 shows three facets of C 4 . 
Part (a) highlights the cube that comes from the left square face of C 3 . Part (b) shows 
the cube that comes from the front square face of C 3 . And part (c) emphasizes the cube 
that comes from the top square face of C 3 . 


2 The vector sum of two sets A and B is defined by AB = {c : c = a + b for some a G A and b € B}. 
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c 1 c 2 

FIGURE 9 Constructing the cube C 3 . 


C 3 



FIGURE 10 C 4 projected onto R 2 . 





FIGURE 11 Three of the cubic facets of C 4 . 

Figure 12 shows another representation of C 4 in which the translated cube is placed 
“inside” C 3 . This makes it easier to visualize the cubic facets of C 4 , since there is less 
distortion. 



FIGURE 12 The translated image of 
C 3 is placed “inside” C 3 to obtain C 4 . 


Altogether, the 4-dimensional cube C 4 has eight cubic faces. Two come from the 
original and translated images of C 3 , and six come from the square faces of C 3 that are 
stretched into cubes. The square 2-dimensional faces of C 4 come from the square faces 
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of C 3 and its translate, and the edges of C 3 that are stretched into squares. Thus there 
are 2x6 + 12 = 24 square faces. To count the edges, take 2 times the number of edges 
in C 3 and add the number of vertices in C 3 . This makes 2x12 + 8 = 32 edges in C 4 . 
The vertices in C 4 all come from C 3 and its translate, so there are 2x8 = 16 vertices. 

One of the truly remarkable results in the study of polytopes is the following for¬ 
mula, first proved by Leonard Euler (1707-1783). It establishes a simple relationship 
between the number of faces of different dimensions in a poly tope. To simplify the 
statement of the formula, let fk{P) denote the number of A:-dimensional faces of an 
«-dimensional polytope P. 3 

n—\ 

Euler’s formula: = 1 + (—l) w — 1 

k=0 

In particular, when n = 3,v — e f = 2, where v, e, and / denote the number of 
vertices, edges, and facets (respectively) of P. 

PRACTICE PROBLEM 


1. Find the minimal representation of the polytope P defined by the inequalities Ax < b 



"1 3" 


"12" 

and x > 0, when A = 

1 2 

2 1 

and b = 

9 

12 


8.5 EXERCISES 


1. Given points pi = ◦ , p 2 = 

let S = conv {p l9 p 2 , p 3 }. For each linear functional /, find 
the maximum value m of / on the set S, and find all points 
x in 5 at which /(x) = m. 

a. / (^1 , X 2 ) = X\ — X 2 b. f(X\,X 2 ) = Xi + X 2 

C. /(Xi,X2) = —3xi + X 2 


and p 3 


in E 2 


2 . 


3. 


Given points p t = 



,and p 3 


inR 2 , 


let S = conv {p 1? p 2 , p 3 }. For each linear functional /, find 
the maximum value m of / on the set S, and find all points 
x in 5 at which /(x) = m. 

a. / (xi, x 2 ) = Xi x 2 b. f(x \, x 2 ) = X\ — x 2 

c. /( 又 1 , X 2 ) = —2xi + X 2 

Repeat Exercise 1 where m is the minimum value of / on 


instead of the maximum value. 


4. Repeat Exercise 2 where m is the minimum value of / on 5 
instead of the maximum value. 


In Exercises 5-8, find the minimal representation of the polytope 
defined by the inequalities Ax < b and x > 0. 




10 

15 


6 . 

7. 

8 . 

9. 


,b 


,b 


18 

16 

18' 

10 

28 


Let S = {(x, y) '• x 2 (y — l) 2 < 1} U {(3,0)}. Is the ori¬ 
gin an extreme point of conv SI Is the origin a vertex of 
conv SI 


10. Find an example of a closed convex set S in R 2 such that its 
profile P is nonempty but conv P ♦ S. 

11. Find an example of a bounded convex set S in R 2 such that 
its profile P is nonempty but conv P ♦ S. 

12 . a. Determine the number of A:-faces of the 5-dimensional 

simplex S 5 for k = 0,1,..., 4. Verify that your answer 
satisfies Euler’s formula. 

b. Make a chart of the values of fk (S n ) forn = l,... ,5 and 
k = 0,1, … ， 4. Can you see a pattern? Guess a general 
formula for fk (S n ). 


3 A proof is presented in Steven R. Lay, Convex Sets and Their Applications (New York: John Wiley & 

Sons, 1982; Mineola, NY: Dover Publications, 2007), p. 131. 
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13. a. Determine the number of A:-faces of the 5-dimensional 

hypercube C 5 for k = 0,1,..., 4. Verify that your an¬ 
swer satisfies Euler’s formula. 

b. Make a chart of the values of fk (C n ) for w = 1,..., 5 and 
k = 0,1,..., 4. Can you see a pattern? Guess a general 
formula for f/ c (C n ). 

14. Suppose Vi,... ,y^ are linearly independent vectors in 

R” (1 < A: < n). Then the set X k = conv …， 士 va ：} 

is called a A:-crosspolytope. 

a. Sketch X 1 and X 2 . 

b. Determine the number of A:-faces of the 3-dimensional 
crosspolytope X 3 for k = 0,1,2. What is another name 
for X 3 ? 

c. Determine the number of A:-faces of the 4-dimensional 
crosspolytope X 4 for k = 0, \,2,3. Verify that your 
answer satisfies Euler’s formula. 

d. Find a formula for fk (X n ), the number of A:-faces of X n , 
for 0 < A: < « — 1. 

15. A A>pyramid P k is the convex hull of a (A: — l)-polytope 

Q and a point x ^ aff 2 - Find a formula for each of the 

following in terms of /； (Q), j = 0, — 1. 

a. The number of vertices of P n \ fo(P n ). 

b. The number of -faces of P n : fk(P n ), for l < k < n — 2. 

c. The number of (n — 1)-dimensional facets of P n : 

In Exercises 16 and 17, mark each statement True or False. Justify 

each answer. 

16. a. A polytope is the convex hull of a finite set of points. 

b. Let p be an extreme point of a convex set S. If u, v G S, 
p € Hv, and p 一 u, then p = y. 

c. If 5 is a nonempty convex subset of R”，then S is the 
convex hull of its profile. 

d. The 4-dimensional simplex S 4 has exactly five facets, 
each of which is a 3-dimensional tetrahedron. 


17. a. A cube in R 3 has five facets. 

b. A point p is an extreme point of a polytope P if and only 
if p is a vertex of P. 

c. If 5 is a nonempty compact convex set and a linear 
functional attains its maximum at a point p, then p is an 
extreme point of S. 

d. A 2-dimensional polytope always has the same number 
of vertices and edges. 

18. Let y be an element of the convex set S. Prove that v is an 
extreme point of S if and only if the set {x G 5 : x _ v} is 
convex. 

19. If c € R and is a set, define cS = {cx : x G S}. Let S 

be a convex set and suppose c > 0 and d > 0. Prove that 
cS dS = (c d)S. 

20. Find an example to show that the convexity of »S is necessary 
in Exercise 19. 

21. If A and B are convex sets, prove that A B is convex. 

22. A polyhedron (3-polytope) is called regular if all its facets 
are congruent regular polygons and all the angles at the 
vertices are equal. Supply the details in the following proof 
that there are only five regular polyhedra. 

a. Suppose that a regular polyhedron has r facets, each of 
which is a A>sided regular polygon, and that s edges 
meet at each vertex. Letting v and e denote the numbers 
of vertices and edges in the polyhedron, explain why 
hr = 2e and sv = 2e. 

, p , , , 1111 

b. Use Euler’s formula to show that —— h —= —— I —— . 

s k 2 e 

c. Find all the integral solutions of the equation in part 
(b) that satisfy the geometric constraints of the problem. 
(How small can k and s be?) 

For your information, the five regular polyhedra are the tetra¬ 
hedron (4, 6,4), the cube (8, 12,6), the octahedron (6,12, 8), 
the dodecahedron (20, 30, 12), and the icosahedron (12, 30, 
20). (The numbers in parentheses indicate the numbers of 
vertices, edges, and faces, respectively.) 


SOLUTION TO PRACTICE PROBLEM 


1. The matrix inequality Ax < b yields the following system of inequalities: 


(a) a + 3 又 2 $ 12 


(b) Xi -\-2x 2 <9 

(c) 2x\ X 2 < 12 


The condition x > 0, places the polytope in the first quadrant of the plane. One 
vertex is (0,0). The Xi-intercepts of the three lines (when X 2 = 0) are 12, 9, and 6, 
so (6,0) is a vertex. The X 2 -intercepts of the three lines (when = 0) are 4, 4.5, 
and 12, so (0,4) is a vertex. 
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How do the three boundary lines intersect for positive values of x\ and X 2 ? The 
intersection of (a) and (b) is at p ab = (3,3). Testing p ab in (c) gives 2(3) + 1(3)= 
9 < 12, so p ab is in P. The intersection of (b) and (c) is at p bc = (5,2). Testing p bc 
in ⑻ gives 1(5) + 3(2) = 11 < 12, so p bc is in P. The intersection of (a) and (c) 
is at p ac = (4.8,2.4). Testing p ac in (b) gives 1(4.8) + 2(2.4) = 9.6 > 9. So p ac is 


not in P. 


Finally, the three vertices (extreme points) of the poly topes are (0,0) ，（ 6,0 )， 
(5,2) (3,3), and (0,4). These points form the minimal representation of P. This is 
displayed graphically in Fig. 13. 



FIGURE 13 


8.6 CURVES AND SURFACES 


For thousands of years, builders used long thin strips of wood to create the hull of a boat. 
In more recent times, designers used long, flexible metal strips to lay out the surfaces of 
cars and airplanes. Weights and pegs shaped the strips into smooth curves called natural 
cubic splines. The curve between two successive control points (pegs or weights) has 
a parametric representation using cubic polynomials. Unfortunately, such curves have 
the property that moving one control point affects the shape of the entire curve, because 
of physical forces that the pegs and weights exert on the strip. Design engineers had 
long wanted local control of the curve—in which movement of one control point would 
affect only a small portion of the curve. In 1962, a French automotive engineer, Pierre 
Bezier, solved this problem by adding extra control points and using a class of curves 
now called by his name. 

Bezier Curves 

The curves described below play an important role in computer graphics as well as 
engineering. For example, they are used in Adobe Illustrator and Macromedia Freehand, 
and in application programming languages such as OpenGL. These curves permit a 
program to store exact information about curved segments and surfaces in a relatively 
small number of control points. All graphics commands for the segments and surfaces 
have only to be computed for the control points. The special structure of these curves 
also speeds up other calculations in the “graphics pipeline” that creates the final display 
on the viewing screen. 

Exercises in Section 8.3 introduced quadratic Bezier curves and showed one method 
for constructing Bezier curves of higher degree. The discussion here focuses on quadratic 
and cubic Bezier curves, which are determined by three or four control points, denoted 
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by Po, Pi ， P 2 , an d P 3 . These points can be in R 2 or R 3 , or they can be represented by 
homogeneous forms in R 3 or R 4 . The standard parametric descriptions of these curves, 
for 0 < ^ < 1, are 

w(f) = (1 - 0 2 Po + 2 ?(1 - OPl + t 2 p 2 (1) 

X ⑴ = (1 - 0 3 p 0 + 3f(l - 0 2 Pi + 3t 2 (l-t)p 2 + r 3 p 3 (2) 

Figure 1 shows two typical curves. Usually, the curves pass through only the initial and 
terminal control points, but a Bezier curve is always in the convex hull of its control 
points. (See Exercises 21-24 in Section 8.3.) 



Po Po 

FIGURE 1 Quadratic and cubic Bezier curves. 



Bezier curves are useful in computer graphics because their essential properties are 
preserved under the action of linear transformations and translations. For instance, if 
^4 is a matrix of appropriate size, then from the linearity of matrix multiplication, for 


Ax(t) = A[(l - 0 3 Po + 3^(1 - 0 2 Pi + 3r 2 (l - 0P2 + ， 3 P 3 ] 

=(1 — f) 3 ^4p 0 + 3^(1 — + 3^ 2 (1 — t)Ap2 H - ^ 3 ^4p3 

The new control points are ^4p 0 ,... ， Ap 3 . Translations of Bezier curves are considered 
in Exercise 1. 

The curves in Fig. 1 suggest that the control points determine the tangent lines to 
the curves at the initial and terminal control points. Recall from calculus that for any 
parametric curve, say y(^), the direction of the tangent line to the curve at a point y(t) 
is given by the derivative y r (t), called the tangent vector of the curve. (This derivative 
is computed entry by entry.) 


EXAMPLE 1 Determine how the tangent vector of the quadratic Bezier curve w(f) 
is related to the control points of the curve, at ^ = 0 and t = l. 


SOLUTION Write the weights in equation (1) as simple polynomials 
w(f) = (1 — 2? + f 2 )p 0 + (2f — 2? 2 )pj + / 2 p2 
Then, because differentiation is a linear transformation on functions, 


w’(f) = ( _ 2 + 2t)p 0 + (2 — 4?)pj + 2tp 2 

w\0) = -2p 0 + 2pj = 2(p t - p 0 ) 
w’(l) = -2p! + 2p 2 = 2(p 2 — pj 


The tangent vector at p 0 , for instance, points from p 0 to p 1? but it is twice as long 
as the segment from p 0 to pj. Notice that w’(0) = 0 when pj = p 0 . In this case, 
w(0 = (1 — ^ 2 )Pi + t 2 p 2 , and the graph of w(?) is the line segment from 
to p 2 . ■ 





8.6 Curves and Surfaces 483 


Connecting Two Bezier Curves 

Two basic Bezier curves can be joined end to end, with the terminal point of the first 
curve x(t) being the initial point p 2 of the second curve y(/). The combined curve is 
said to have G° geometric continuity (at p 2 ) because the two segments join at p 2 . If the 
tangent line to curve 1 at p 2 has a different direction than the tangent line to curve 2, 
then a “corner，” or abrupt change of direction, may be apparent at p 2 . See Fig. 2. 



To avoid a sharp bend, it usually suffices to adjust the curves to have what is called 
G 1 geometric continuity, where both tangent vectors at p 2 point in the same direction. 
That is, the derivatives x’ （ l) and y’(0) point in the same direction, even though their 
magnitudes may be different. When the tangent vectors are actually equal at p 2 , the 
tangent vector is continuous at p 2 , and the combined curve is said to have C 1 continuity, 
or C 1 parametric continuity. Figure 3 shows G 1 continuity in (a) and C 1 continuity 
in (b). 



T I I ▼ I I I I 

0 2 4 6 8 10 12 14 

(a) (b) 


FIGURE 3 (a) G 1 continuity and (b) C 1 continuity. 

EXAMPLE 2 Let x(^) and y (Y) determine two quadratic Bezier curves, with control 

points {p 0 , p 1? p 2 } and {p 2 , p 3 , p 4 }, respectively. The curves are joined at p 2 = x(l)= 

y (0). 

a. Suppose the combined curve has G 1 continuity (at p 2 ). What algebraic restriction 
does this condition impose on the control points? Express this restriction in geomet¬ 
ric language. 

b. Repeat part (a) for C 1 continuity. 

SOLUTION 

a. From Example 1, x’ （ l) = 2(p 2 — Pi). Also, using the control points for y ⑴ in 
place of w(/), Example 1 shows that y^O) = 2(p 3 — p 2 ). G 1 continuity means that 
y r (0) = kx\l) for some positive constant k. Equivalently, 

P 3 —P 2 = MP 2 — Pi), with k >0 (3) 
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Geometrically, (3) implies that p 2 lies on the line segment from pj to p 3 . To 
prove this, let t = (k -\- 1) _1 , and note that 0 < ^ < 1. Solve for k to obtain 
k = (l — t)/t. When this expression is used for k in (3), a rearrangement shows 
that p 2 = (1 — + fp 3 , which verifies the assertion about p 2 . 

b. C 1 continuity means that y’(0) = x’(l). Thus 2(p 3 — p 2 ) = 2(p 2 — p^, so 
p 3 — p 2 = p 2 — Pi ， and p 2 = (Pi + p 3 )/2. Geometrically, p 2 is the midpoint of 
the line segment from pj to p 3 . See Fig. 3. ■ 

Figure 4 shows C 1 continuity for two cubic Bezier curves. Notice how the point 
joining the two segments lies in the middle of the line segment between the adjacent 
control points. 



Two curves have C 2 (parametric) continuity when they have C 1 continuity and the 
second derivatives x"(l) and y"(0) are equal. This is possible for cubic Bezier curves, 
but it severely limits the positions of the control points. Another class of cubic curves, 
called B-splines ，always have C 2 continuity because each pair of curves share three 
control points rather than one. Graphics figures using B-splines have more control points 
and consequently require more computations. Some exercises for this section examine 
these curves. 

Surprisingly, if x(?) and y(t) join at p 3 , the apparent smoothness of the curve at 
p 3 is usually the same for both G 1 continuity and C 1 continuity. This is because the 
magnitude of x\t) is not related to the physical shape of the curve. The magnitude 
reflects only the mathematical parameterization of the curve. For instance, if a new 
vector function z(t) equals x(2t), then the point z(t) traverses the curve from p 0 to p 3 
twice as fast as the original version, because 2t reaches 1 when t is .5. But, by the chain 
rule of calculus, i!(t) = 2 - x’(2f), so the tangent vector to z(t) atp 3 is twice the tangent 
vector to x(f) at p 3 . 

In practice, many simple Bezier curves are joined to create graphics objects. Type¬ 
setting programs provide one important application, because many letters in a type font 
involve curved segments. Each letter in a PostScript® font, for example, is stored as a 
set of control points, along with information on how to construct the “outline” of the 
letter using line segments and Bezier curves. Enlarging such a letter basically requires 
multiplying the coordinates of each control point by one constant scale factor. Once the 
outline of the letter has been computed, the appropriate solid parts of the letter are filled 
in. Figure 5 illustrates this for a character in a PostScript font. Note the control points. 
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FIGURE 5 A PostScript character. 


Matrix Equations for Bezier Curves 


Since a Bezier curve is a linear combination of control points using polynomials as 
weights, the formula for x ⑴ may be written as 


x (0 = [Po Pi P 2 


= [Po Pi P2 


= [Po Pi P2 


P3] 




(1-0 3 
3^(1 - 0 2 
3r 2 (l -0 


1 - + 3^ 2 - t 3 

3t — 6 / '2 + 3,3 
3t 2 - 3t 3 
t 3 

1-3 3 -1" 

0 3-63 

0 0 3 -3 

0 0 0 1 


The matrix whose columns are the four control points is called a geometry matrix, G. 
The 4 x 4 matrix of polynomial coefficients is the Bezier basis matrix, Mg. If u(0 is 
the column vector of powers of t, then the Bezier curve is given by 


x(0 = GM B u(t) ⑷ 

Other parametric cubic curves in computer graphics are written in this form, too. 
For instance, if the entries in the matrix Mb are changed appropriately, the resulting 
curves are B-splines. They are “smoother” than Bezier curves, but they do not pass 
through any of the control points. A Hermite cubic curve arises when the matrix Mb 
is replaced by a Hermite basis matrix. In this case, the columns of the geometry matrix 
consist of the starting and ending points of the curves and the tangent vectors to the 
curves at those points. 1 

The Bezier curve in equation (4) can also be “factored” in another way, to be used 
in the discussion of Bezier surfaces. For convenience later, the parameter t is replaced 


1 The term basis matrix comes from the rows of the matrix that list the coefficients of the blending poly¬ 

nomials used to define the curve. For a cubic Bezier curve, the four polynomials are (1 — t) 3 , 3?(1 — t) 2 , 

3t 2 (\ — t), and ? 3 . They form a basis for the space P 3 of polynomials of degree 3 or less. Each entry in the 
vector x(t) is a linear combination of these polynomials. The weights come from the rows of the geometry 
matrix G in (4). 
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and recall from equation (4) that a Bezier curve is produced when any one of these 
matrices is multiplied on the right by the following vector of weights: 


M B u(t) 


(i-O 3 . 
3f(l -0 2 
3f 2 (l -f) 


Let G be the block (partitioned) 4x4 matrix whose entries are the control points p /y 
displayed above. Then the following product is a block 4x1 matrix, and each entry is 
a Bezier curve: 


GM B u(t)= 


Pll 

P 12 

Pl3 

Pl4 

P 21 

P 22 

P23 

P24 

P 31 

P 32 

P33 

P34 

P 41 

P 42 

P43 

P44 


a-o 3 . 

3t(l-t) 2 

3t 2 (l-t) 


In fact, 


(1 — 0 3 Pll + 3《1 — 0 2 Pl2 + 3f 2 (l — 0Pl3 + ^ 3 Pl4 
(1 — 0 3 P21 + 3“1 — 0 2 P22 + 3f 2 (l — OP 23 + ^ 3 P24 
— 0 3 P^i +3/(1— + 3^ 2 (1 — OPu + 


- U P21 十 - U P22 十 ^ U - UP23 十厂 P24 

(1 — 0 3 P31 + 3“1 — t) 2 p 32 + 3r 2 (l — OP 33 + ^ 3 P 34 

(1 — 0 3 P41 + 义 (1 — 0 2 P42 + 3^ 2 (1 — 0P43 + ^P44 


by a parameter s: 


x(s) = u(s) t M^ 


Po 

Pi 

P2 

P3 


[(1 — 夕 ) 3 3 夕 (1— 5) 2 3 夕 2 (1 — 5 1 ) s 3 ] 


1 0 0 

-3 3 0 

3-6 3 

-1 3-3 

Po 
Pi 
P2 

P3 


Po 

Pi 

P2 

P3 


(5) 


This formula is not quite the same as the transpose of the product on the right of 
(4), because x(s) and the control points appear in (5) without transpose symbols. The 
matrix of control points in (5) is called a geometry vector. This should be viewed as a 
4x1 block (partitioned) matrix whose entries are column vectors. The matrix to the left 
of the geometry vector, in the second part of (5), can be viewed as a block matrix, too, 
with a scalar in each block. The partitioned matrix multiplication makes sense, because 
each (vector) entry in the geometry vector can be left-multiplied by a scalar as well as 
by a matrix. Thus, the column vector x(.s') is represented by (5). 


Bezier Surfaces 

A 3D bicubic surface patch can be constructed from a set of four Bezier curves. Consider 
the four geometry matrices 


— _ I — I I — I I — I 

4 4 4 4 
12 3 4 

p p p p 


12 3 4 

p p p p 

2 2 2 2 
12 3 4 

p p p p 


12 3 4 

p p p p 

_I I_I I_I I_I 


GMgu(?)= 
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Now fix t. Then GM^u(0 is a column vector that can be used as a geometry vector 
in equation (5) for a Bezier curve in another variable s. This observation produces the 

Bezier bicubic surface: 

x(5 ■ ，广 ） = \x(s) T M q GMsVi(t), where 0 < ^, ^ < 1 (6) 

The formula for x^, t) is a linear combination of the sixteen control points. If one 
imagines that these control points are arranged in a fairly uniform rectangular array, as 
in Fig. 6, then the Bezier surface is controlled by a web of eight Bezier curves, four 
in the “x-direction” and four in the “f-direction.” The surface actually passes through 
the four control points at its “corners.” When it is in the middle of a larger surface, the 
sixteen-point surface shares its twelve boundary control points with its neighbors. 


P 21 P 11 



FIGURE 6 Sixteen control points for a Bezier 
bicubic surface patch. 


Approximations to Curves and Surfaces 

In CAD programs and in programs used to create realistic computer games, the designer 
often works at a graphics workstation to compose a “scene” involving various geometric 
structures. This process requires interaction between the designer and the geometric 
objects. Each slight repositioning of an object requires new mathematical computations 
by the graphics program. Bezier curves and surfaces can be useful in this process 
because they involve fewer control points than objects approximated by many polygons. 
This dramatically reduces the computation time and speeds up the designer’s work. 

After the scene composition, however, the final image preparation has different 
computational demands that are more easily met by objects consisting of flat surfaces 
and straight edges, such as polyhedra. The designer needs to render the scene, by in¬ 
troducing light sources, adding color and texture to surfaces, and simulating reflections 
from the surfaces. 

Computing the direction of a reflected light at a point p on a surface, for instance, 
requires knowing the directions of both the incoming light and the surface normal—tht 
vector perpendicular to the tangent plane at p. Computing such normal vectors is much 
easier on a surface composed of, say, tiny flat polygons than on a curved surface whose 
normal vector changes continuously as p moves. If p 1? p 2 , and p 3 are adjacent ver¬ 
tices of a flat polygon, then the surface normal is just plus or minus the cross product 
(p 2 — Pj) x (p 2 — p 3 ). When the polygon is small, only one normal vector is needed for 
rendering the entire polygon. Also, two widely used shading routines, Gouraud shading 
and Phong shading, both require a surface to be defined by polygons. 

As a result of these needs for flat surfaces, the Bezier curves and surfaces from the 
scene composition stage now are usually approximated by straight line segments and 
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polyhedral surfaces. The basic idea for approximating a Bezier curve or surface is to 
divide the curve or surface into smaller pieces, with more and more control points. 

Recursive Subdivision of Bezier Curves and Surfaces 

Figure 7 shows the four control points p 0 ,..., p 3 for a Bezier curve, along with control 
points for two new curves, each coinciding with half of the original curve. The “left” 
curve begins at q 0 = p 0 and ends at q 3 , at the midpoint of the original curve. The “right” 
curve begins at ro = q 3 and ends at r 3 = p 3 . 



Figure 8 shows how the new control points enclose regions that are “thinner” than 
the region enclosed by the original control points. As the distances between the control 
points decrease, the control points of each curve segment also move closer to a line 
segment. This variation-diminishing property of Bezier curves depends on the fact that 
a Bezier curve always lies in the convex hull of the control points. 



The new control points are related to the original control points by simple formulas. 
Of course, q 0 = p 0 and r 〗 =p 3 . The midpoint of the original curve x(t) occurs at x(.5) 
when x(t) has the standard parameterization, 

x(t) = (1 — 3’ + 3’2 — /3)po + (3 广一 6’2 + 3f3)pi + (3f2 — 3’3) 卩 2 + ’Sp] (7) 

forO < t < l. Thus, the new control points q 3 and ro are given by 

q 3 = r 0 = x(.5) = |(p 0 + 3p! + 3p 2 + p 3 ) (8) 

The formulas for the remaining “interior” control points are also simple, but the deriva¬ 
tion of the formulas requires some work involving the tangent vectors of the curves. By 
definition, the tangent vector to a parameterized curve x(^) is the derivative x\t). This 
vector shows the direction of the line tangent to the curve at x(f). For the Bezier curve 
in (7), 

x’(/) = (—3 6t — 3/ 2 )Po + (3 — I2t + 9? 2 )pj + (6t — 9/ 2 )P2 + 3^ 2 p 3 

forO < t < 1. In particular, 

x’ ⑼ = 3(p 广 p 0 ) and x\l) = 3(p 3 - p 2 ) 


⑼ 
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Geometrically, Pi is on the line tangent to the curve at p 0 , and p 2 is on the line tangent 
to the curve at p 3 . See Fig. 8. Also, from x’(0, compute 

X’(.5) = |(—Po _ Pi + P 2 + P3) (10) 

Let y(/) be the Bezier curve determined by q 0 , … ， q 3 , and let z(t) be the Bezier curve 
determined by r 。， …， r〗.Since y{t) traverses the same path as x(t) but only gets to 
x(.5) as t goes from 0 to 1 ， y(t) = x(.5t) for 0 < t < 1. Similarly, since z(t) starts at 
x(.5) when f = 0, z(t) = x(.5 + .5t) for 0 < f < 1. By the chain rule for derivatives, 

y\t) = .5x\.5t) and i!= .5x’(.5 + .5/) for 0 < ? < 1 (11) 

From (9) with y’(0) in place of x’(0), from (11) with t = 0, and from (9), the control 
points for y(t) satisfy 

3(qi - q 0 ) = / ⑼ =.5x ， ⑼ H ( Pl - p 0 ) (12) 

From (9) with y’(l) in place of x’(l), from (11) with t = 1, and from (10), 

3(q 3 - q 2 ) = y’(l) = -5x’(.5) = |(-p 0 - Pi + p 2 + p 3 ) (13) 

Equations (8), (9), (10), (12), and (13) can be solved to produce the formulas for q 0 ,..., 
q 3 shown in Exercise 13. Geometrically, the formulas are displayed in Fig. 9. The 
interior control points q! and T 2 are the midpoints, respectively, of the segment from p 0 
to pj and the segment from p 2 to p 3 . When the midpoint of the segment from to p 2 
is connected to q l9 the resulting line segment has q 2 in the middle! 


i(pi + p 2 ) 



This completes one step of the subdivision process. The “recursion” begins, and 
both new curves are subdivided. The recursion continues to a depth at which all curves 
are sufficiently straight. Alternatively, at each step the recursion can be “adaptive” and 
not subdivide one of the two new curves if that curve is sufficiently straight. Once the 
subdivision completely stops, the endpoints of each curve are joined by line segments, 
and the scene is ready for the next step in the final image preparation. 

A Bezier bicubic surface has the same variation-diminishing property as the Bezier 
curves that make up each cross-section of the surface, so the process described above 
can be applied in each cross-section. With the details omitted, here is the basic strategy. 
Consider the four “parallel” Bezier curves whose parameter is s, and apply the subdivi¬ 
sion process to each of them. This produces four sets of eight control points; each set 
determines a curve as s varies from 0 to 1 • As t varies, however, there are eight curves, 
each with four control points. Apply the subdivision process to each of these sets of 
four points, creating a total of 64 control points. Adaptive recursion is possible in this 
setting, too, but there are some subtleties involved. 2 


2 See Foley, van Dam, Feiner, and Hughes, Computer Graphics—Principles and Practice, 2nd Ed. (Boston: 
Addison-Wesley, 1996), pp. 527-528. 
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PRACTICE PROBLEMS 


A spline usually refers to a curve that passes through specified points. A B-spline ， 
however, usually does not pass through its control points. A single segment has the 
parametric form 

x (0 = K(1 _ 0 3 Po + ⑶ 3 — 6f 1 2 + 4)Pi 

(14) 

+ ( — 3/ 3 + 3^ 2 + + 1)P2 + , 3 P3] 

for 0 ^ ^ ^ 1, where Po, Pi, P 2 , and P 3 are the control points. When t V 3 i*ics from 0 to 1 ， 
x(f) creates a short curve that lies close to p^p^. Basic algebra shows that the B-spline 
formula can also be written as 

x (0 = K(1 _ 0 3 Po + (3^(1 — t) 2 — 3r + 4)pj ) 

+ ⑶ 2 (1 — t) 3t l)p 2 + f 3 P3] 

This shows the similarity with the Bezier curve. Except for the 1/6 factor at the front, 
the Po and p 3 terms are the same. The pj component has been increased by —3t +4 
and the p 2 component has been increased by 3^ + 1. These components move the curve 
closer to pjp 2 than the Bezier curve. The 1/6 factor is necessary to keep the sum of the 
coefficients equal to 1. Figure 10 compares a B-spline with a Bezier curve that has the 
same control points. 



FIGURE 10 A B-spline segment and a Bezier curve. 


1. Show that the B-spline does not begin at p 0 , but x(0) is in conv {p 。， Pi ， P 2 ). Assum¬ 
ing that p 0 , p 1? and p 2 are affinely independent, find the affine coordinates of x(0) 
with respect to {p 0 , Pi ， p 2 }. 

2. Show that the B-spline does not end at p 3 , but x(l) is in conv {Pi ， P 2 , P 3 }. Assuming 
that p l5 p 2 , and p 3 are affinely independent, find the affine coordinates of x(l) with 

respect to {Pi ， p 2 ,P 3 }- 


8.6 EXERCISES 

1. Suppose a Bezier curve is translated to x(t) + b. That is, for 
0 < / < 1, the new curve is 

x(0 = (1 - ， ) 3 p 0 + 3/(1 - 0 2 Pi 

+ 3/2(1 — ?)P2 + f 3 P3 + b 

Show that this new curve is again a Bezier curve. [Hint: 
Where are the new control points?] 

2. The parametric vector form of a B-spline curve was defined 
in the Practice Problems as 

x (0 = g[(l ~ 厂 ) 3 Po Ot(l — t) — 3t + 4)p! 

+ (3r 2 (l - 0 + 3/ + l)p 2 + Pp 3 ] forO < ^ < 1, 

where p 0 , p l9 p 2 , and p 3 are the control points. 


a. Show that for 0 < ? < 1, x(t) is in the convex hull of the 
control points. 

b. Suppose that a B-spline curve x(t) is translated to 
x(t) + b (as in Exercise 1). Show that this new curve 
is again a B-spline. 

3. Let x(?) be a cubic Bezier curve determined by points p 0 , p L , 
p 2 , andp 3 . 

a. Compute the tangent vector (t). Determine how x r (0) 
and x’(l) are related to the control points, and give 
geometric descriptions of the directions of these tangent 
vectors. Is it possible to have x’(l) = 0? 

b. Compute the second derivative x〃(0 and determine how 
x’’(0) and x’’ （ l) are related to the control points. Draw 
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a figure based on Fig. 10, and construct a line segment 
that points in the direction of x 〃 (0). [Hint: Use p 2 as the 
origin of the coordinate system.] 

4. Let x(/) be the B-spline in Exercise 2, with control points p 0 , 
p 1? p 2 , and p 3 . 

a. Compute the tangent vector x r (/) and determine how 
the derivatives x’(0) and x’ （ l) are related to the control 
points. Give geometric descriptions of the directions of 
these tangent vectors. Explore what happens when both 
x’(0) and x’ （ l) equal 0. Justify your assertions. 

b. Compute the second derivative x" (t) and determine how 
x’’(0) and x"(l) are related to the control points. Draw 
a figure based on Fig. 10, and construct a line segment 
that points in the direction of x 〃（ l). [Hint: Use p 2 as the 
origin of the coordinate system.] 

5. Let x(t) and y(^) be cubic Bezier curves with control points 
(Po Pi P 2 P 3 } and {p 3 ， P 4 ， p 5 ， p 6 }， respectively, so that x(0 
and y(t) are joined at p 3 . The following questions refer to 
the curve consisting of x(t) followed by y(^). For simplicity, 
assume that the curve is in R 2 . 

a. What condition on the control points will guarantee that 
the curve has C 1 continuity at p 3 ? Justify your answer. 

b. What happens when x’ （ l) and y’(0) are both the zero 
vector? 

6. A B-spline is built out of B-spline segments, described in 
Exercise 2. Let p 0 ,... ,p 4 be control points. For 0 < ^ < 1, 
let x(t) and y(/) be determined by the geometry matrices 
[Po Pi P 2 P 3 ] and [pi p 2 p 3 p 4 ], respectively. 
Notice how the two segments share three control points. 
The two segments do not overlap, however—they join at a 
common endpoint, close to p 2 . 

a. Show that the combined curve has G° continuity—that is, 
x(l) = y ⑼. 

b. Show that the curve has C 1 continuity at the join point, 
x(l). That is, show that x’(l) = y’(0). 

7. Let x(t) and y(t) be Bezier curves from Exercise 5, and sup¬ 
pose the combined curve has C 2 continuity (which includes 
C 1 continuity) at p 3 . Set x’’ （ l) = y’’(0) and show that p 5 is 
completely determined by p l9 p 2 , and p 3 . Thus, the points 
p 0 ,, p 3 and the C 2 condition determine all but one of the 
control points for y(t). 

8. Let x(t) and y(t) be segments of a B-spline as in Exercise 
6. Show that the curve has C 2 continuity (as well as C 1 
continuity) at x(l). That is, show that x 〃 (l) = y"(0). This 
higher-order continuity is desirable in CAD applications such 
as automotive body design, since the curves and surfaces 
appear much smoother. However, B-splines require three 
times the computation of Bezier curves, for curves of com¬ 
parable length. For surfaces, B-splines require nine times the 
computation of Bezier surfaces. Programmers often choose 
Bezier surfaces for applications (such as an airplane cockpit 
simulator) that require real-time rendering. 


9. A quartic Bezier curve is determined by five control points, 
P0 ， PlP2,P3 ， and P4 : 

x(0 = (1 - f) 4 p 0 + 4，(1 - 0 3 Pi + 6 r 2 (l - t) 2 p 2 

+ 4r 3 (l — OP 3 + ， 4 P 4 for 0 < / < 1 

Construct the quartic basis matrix Mb for x(/). 

10. The “B” in B-spline refers to the fact that a segment x(/) may 
be written in terms of a basis matrix, Ms , in a form similar 
to a Bezier curve. That is, 

x(?) = GM s u(t) for 0 < f < 1 

where G is the geometry matrix [ p 0 p 2 p 3 ] and u(t) 
is the column vector (l,t, t 2 , t 3 ). In a uniform B-spline, each 
segment uses the same basis matrix, but the geometry matrix 
changes. Construct the basis matrix Ms for x(/). 

In Exercises 11 and 12, mark each statement True or False. Justify 
each answer. 

11 . a. The cubic Bezier curve is based on four control points. 

b. Given a quadratic Bezier curve x(t) with control points 
Po, p 1? and p 2 , the directed line segment p L — p 0 (from p 0 
to pD is the tangent vector to the curve at p 0 . 

c. When two quadratic Bezier curves with control points 
{p 0 , Pi ， p 2 } and {p 2 , p 3 , p 4 } are joined at p 2 , the combined 
Bezier curve will have C 1 continuity at p 2 if p 2 is the 
midpoint of the line segment between pj and p 3 . 

12 . a. The essential properties of Bezier curves are preserved 

under the action of linear transformations, but not trans¬ 
lations. 

b. When two Bezier curves x(/) and y(/) are joined at the 
point where x(l) = y(0), the combined curve has G° 
continuity at that point. 

c. The Bezier basis matrix is a matrix whose columns are 
the control points of the curve. 

Exercises 13-15 concern the subdivision of a Bezier curve shown 
in Fig. 7. Let x(^) be the Bezier curve, with control points 
Po,..., p 3 , and let y(t) and z(t) be the subdividing Bezier curves 
as in the text, with control points q 。，... ，屯 an d r 。， ... ， 1 * 3 , respec¬ 
tively. 

13. a. Use equation ( 12 ) to show that q! is the midpoint of the 

segment from p 0 to p r 

b. Use equation (13) to show that 

8 q 2 = 8 q 3 + p 0 + p 1 -p 2 -p 3 . 

c. Use part (b), equation ( 8 )，and part (a) to show that 
q 2 is the midpoint of the segment from to the 
midpoint of the segment from to p 2 . That is, 

q 2 = + ^(Pi +p 2 )l- 

14. a. Justify each equals sign: 

3(r 3 - r 2 ) = z’(l) = .5x’(l) = |(p 3 — p 2 ). 
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b. Show that r 2 is the midpoint of the segment from p 2 to 

P 3 

c. Justify each equals sign: 3(1^ — r。）= z’(0) = .5x’(.5). 

d. Use part (c) to show that 8ri = —p 0 — pj + p 2 + p 3 

+ 8ro. 

e. Use part (d), equation (8), and part (a) to show that ri is 
the midpoint of the segment from r 2 to the midpoint of the 
segment from p! to p 2 . That is, ri = 壶 h + |(Pi + P 2 )]. 

15. Sometimes only one half of a Bezier curve needs further 
subdividing. For example, subdivision of the “left” side 
is accomplished with parts (a) and (c) of Exercise 13 and 
equation (8). When both halves of the curve x(t) are divided, 
it is possible to organize calculations efficiently to calculate 
both left and right control points concurrently, without using 
equation (8) directly. 

a. Show that the tangent vectors y’ （ l) and z’(0) are equal. 

b. Use part (a) to show that q 3 (which equals r。）is the 
midpoint of the segment from q 2 to ri. 

c. Using part (b) and the results of Exercises 13 and 14, write 
an algorithm that computes the control points for both 
y(/) and z(t) in an efficient manner. The only operations 
needed are sums and division by 2. 

16. Explain why a cubic Bezier curve is completely determined 
by x(0), x’(0), x(l), and x’(l). 


17. TrueType ⑧ fonts, created by Apple Computer and Adobe 
Systems, use quadratic Bezier curves, while PostScript® 
fonts, created by Microsoft, use cubic Bezier curves. The 
cubic curves provide more flexibility for typeface design, 
but it is important to Microsoft that every typeface using 
quadratic curves can be transformed into one that uses cubic 
curves. Suppose that w(t) is a quadratic curve, with control 
points p 0 , p l , and p 2 . 

a. Find control points r 。， r], and r 3 such that the cubic 
Bezier curve x(?) with these control points has the prop¬ 
erty that x(^) and w(t) have the same initial and terminal 
points and the same tangent vectors ai t = 0 and t = \. 
(See Exercise 16.) 

b. Show that if x(t) is constructed as in part (a), then 
x(0 = w(，）for 0 < f < 1. 

18. Use partitioned matrix multiplication to compute the follow¬ 
ing matrix product, which appears in the alternative formula 
(5) for a Bezier curve: 


" 1 0 0 0" 


~Po" 

-3 3 0 0 


Pi 

3-630 


P 2 

-1 3-3 1 


■ P3 - 


SOLUTIONS TO PRACTICE PROBLEMS 


1. From equation (14) with f = 0, x(0) # p 0 because 

x (0) = ^[Po + 4pj + p 2 ] = ^p 0 + fpi + 去 P2. 

The coefficients are nonnegative and sum to 1， so x(0) is in conv {p 0 , p 1? p 2 }, and 
the affine coordinates with respect to {p 0 , Pi ， p 2 } are (^, 去 ). 

2. From equation (14) with t = 1, x(l) ^ p 3 because 

x (l) — g[Pi + 4p 2 + p 3 ] = gPi + fp 2 + 去 P3. 

The coefficients are nonnegative and sum to 1, so x(l) is in conv {p" p 2 , p 3 }, and 
the affine coordinates with respect to p 2 , p 3 } are ( 去，譬 ， |). 
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THEOREM 


A 

Uniqueness of the Reduced 
Echelon Form 


Uniqueness of the Reduced Echelon Form 

Each m x n matrix A is row equivalent to a unique reduced echelon matrix U. 


PROOF The proof uses the idea from Section 4.3 that the columns of row-equivalent 
matrices have exactly the same linear dependence relations. 

The row reduction algorithm shows that there exists at least one such matrix U. 
Suppose that A is row equivalent to matrices U and V in reduced echelon form. The 
leftmost nonzero entry in a row of f/ is a “leading 1.” Call the location of such a leading 
la pivot position, and call the column that contains it a pivot column. (This definition 
uses only the echelon nature of U and V and does not assume the uniqueness of the 
reduced echelon form.) 

The pivot columns of U and V are precisely the nonzero columns that are not 
linearly dependent on the columns to their left. (This condition is satisfied automatically 
by a first column if it is nonzero.) Since U and V are row equivalent (both being row 
equivalent to A), their columns have the same linear dependence relations. Hence, the 
pivot columns of U and V appear in the same locations. If there are r such columns, 
then since U and V are in reduced echelon form, their pivot columns are the first r 
columns of the m x m identity matrix. Thus, corresponding pivot columns of U and V 
are equal. 

Finally, consider any nonpivot column of U, say column j. This column is either 
zero or a linear combination of the pivot columns to its left (because those pivot columns 
are a basis for the space spanned by the columns to the left of column j). Either case 
can be expressed by writing t/x = 0 for some x whose jth entry is 1. Then Vx = 0, 
too, which says that column j of V is either zero or the same linear combination of the 
pivot columns of V to its left. Since corresponding pivot columns of U and V are equal, 
columns j of U and V are also equal. This holds for all nonpivot columns, so V = U, 
which proves that U is unique. 


A1 
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B 

Complex Numbers 


A complex number is a number written in the form 

z = a bi 

where a and b are real numbers and i is a formal symbol satisfying the relation i 2 = — 1. 
The number a is the real part of z, denoted by Rez, and b is the imaginary part of 

Z, denoted by Imz. Two complex numbers are considered equal if and only if their 

real and imaginary parts are equal. For example, if z = 5 + (—2)/，then Re z = 5 and 
Imz = —2. For simplicity, we write z = 5 — 2i. 

A real number a is considered as a special type of complex number, by identifying 
a with a Oi. Furthermore, arithmetic operations on real numbers can be extended to 
the set of complex numbers. 

The complex number system, denoted by C, is the set of all complex numbers, 
together with the following operations of addition and multiplication: 

(a + bi) (c + di) = {a -\- c) -\- {b -\- d)i (1) 

(a + bi)(c -\- di) = (ac — bd) + (ad + bc)i (2) 

These rules reduce to ordinary addition and multiplication of real numbers when 
b and d are zero in (1) and (2). It is readily checked that the usual laws of arithmetic 
for R also hold for C. For this reason, multiplication is usually computed by algebraic 
expansion, as in the following example. 

EXAMPLE 1 (5-2/)(3 + 4/) = 15 + 20/ -6/ - Si 2 

= 15+ 14/ -8(-1) 

= 23 + 14/ 

That is, multiply each term of 5 — 2i by each term of 3 + 4/, use i 2 = —1, and write 
the result in the form a + bi. ■ 

Subtraction of complex numbers Z\ and Zi is defined by 

Zl ~Z2 = Zl + (~1)Z2 


A2 


In particular, we write —z in place of (—l)z. 
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The conjugate of z = ci bi is the complex number z (read as “z bar’’)，defined 
by 


z = a — bi 


Obtain z from z by reversing the sign of the imaginary part. 


EXAMPLE 2 The conjugate of —3 + 4/ is —3 — 4/; write —3 + 4/ = —3 — 4/. 


Observe that if z = a -\- bi, then 

zz = {a -\- bi){a — bi) = a 2 — abi + bai — b 2 i 2 = a 2 + b 2 (3) 

Since zz is real and nonnegative, it has a square root. The absolute value (or modulus) 
of z is the real number |z| defined by 

|z| = Vzf = \!a 2 + b 2 

If Z is a real number, then z = a Oi, and |z| = Va^, which equals the ordinary 
absolute value of a. 

Some useful properties of conjugates and absolute value are listed below; w and z 
denote complex numbers. 

1. z = zif and only if z is a real number. 

2 . w z = w-\-z. 

3. viz = wz', in particular, ?z = rz if r is 3. real number. 

4. zz = \z\ 2 > 0. 

5. \wz\ = \yo\\z\. 

6. \w z\ < \w\ + |z|. 

If z # 0, then |z| > 0 and z has a multiplicative inverse, denoted by l/z or z~ l 
and given by 

1 _ _i _ ^ 

广 ■ — kl 2 

Of course, a quotient w/z simply means w - (l/z). 

EXAMPLE 3 Let u; = 3 + 4 / and z = 5 — 2i. Compute zz, |z|, and w/z. 
SOLUTION From equation (3), 

ZZ = 5 2 + (-2) 2 = 25 + 4 = 29 

For the absolute value, |z| = VtE = V29. To compute w/z, first multiply both the 
numerator and the denominator by z, the conjugate of the denominator. Because of (3), 
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this eliminates the i in the denominator: 

w 3 + 4/ 

7 = 5-2/ 

_ 3 + 4/ 5 + 2/ 

— 5-2i * 5 + 2i 
_ 15 + 6/ + 20/ - 8 
= ^5 2 + (-2) 2 ^ 

7 + 26/ 

= 29 

7 26 

= 29 + 29 Z 


■ 


Geometric Interpretation 

Each complex number z = a + bi corresponds to a point (a, b) in the plane R 2 , as 
in Fig. 1. The horizontal axis is called the real axis because the points (a, 0) on it 
correspond to the real numbers. The vertical axis is the imaginary axis because the 
points (0, b) on it correspond to the pure imaginary numbers of the form 0 bi, or 
simply hi. The conjugate of z is the mirror image of z in the real axis. The absolute 
value of z is the distance from (a, b) to the origin. 


Imaginary 

axis 



--- Real axis 

I a 
I 
I 
I 

lz = a —bi 

FIGURE 1 The complex conjugate is a mirror image. 

Addition of complex numbers z = a bi and w = c + di corresponds to vector 
addition of (a,b) and (c, d) in M 2 , as in Fig. 2. 


Im z 

W«L 

Z 


FIGURE 2 Addition of complex numbers. 


Re z 
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To give a graphical representation of complex multiplication, we use polar coordi¬ 
nates in R 2 . Given a nonzero complex number z = a bi,\Qt(p be the angle between 
the positive real axis and the point (a, b), as in Fig. 3 where —n < (p < jt. The angle (p 
is called the argument of z ； we write cp = argz. From trigonometry, 

a = |z| cos<^, b = |z|sinp 


and so 


Z = a -\- bi = |z|(cos^ + i sinp) 



If w is another nonzero complex number, say, 

w = \w\ (cos 以 + / sin 

then, using standard trigonometric identities for the sine and cosine of the sum of two 
angles, one can verify that 

wz = \w\ |z| [cos (沒 + 炉 ） + / sin(i? + ip)] (4) 

See Fig. 4. A similar formula may be written for quotients in polar form. The formulas 
for products and quotients can be stated in words as follows. 




The product of two nonzero complex numbers is given in polar form by the 
product of their absolute values and the sum of their arguments. The quotient 
of two nonzero complex numbers is given by the quotient of their absolute values 
and the difference of their arguments. 


EXAMPLE 4 

a. If w has absolute value 1, then w = cos 办 + / sin 沒 ， where ^ is the argument of w. 
Multiplication of any nonzero number zby w simply rotates z through the angle 

b. The argument of i itself is 7r/2 radians, so multiplication of z by i rotates z through 
an angle of 丌 / 2 radians. For example, 3 + / is rotated into (3 + /)/ = — 1 + 3/. ■ 


Multiplication by i. 









A6 APPENDIX B Complex Numbers 


Powers of a Complex Number 

Formula (4) applies when z = w = r(cos (p + i sin (p). In this case 


Z 2 = r 2 (cos 2^) + i sin 2(p) 
and 

3 2 

z = z-z 

=r (cos (p i sin ip) - r 2 (cos 2cp + i sin 2cp) 
= r 3 (cos 3(p + i sin 3(p) 


In general, for any positive integer k, 

Z k = r k (cos kcp + i sin k(p) 


This fact is known as De Moivre’s Theorem. 


Complex Numbers and R 2 

Although the elements of R 2 and C are in one-to-one correspondence, and the operations 
of addition are essentially the same, there is a logical distinction between R 2 and C. In 
M 2 we can only multiply a vector by a real scalar, whereas in C we can multiply any 
two complex numbers to obtain a third complex number. (The dot product in M 2 doesn’t 
count, because it produces a scalar, not an element of R 2 .) We use scalar notation for 
elements in C to emphasize this distinction. 


x i 



• (2, 4) 

(-1,2) 

參 

(4, 0) 

• 


(-3,-1) 

參 


(3,-2) 


Im z 



- •2 + 4/ 

-1+2/ 


參 



4 + 0/ 

• 


-3-/ 



• 


3-2/ 


The real plane R 2 . 


The complex plane C. 






Glossary 


A 

adjugate (or classical adjoint): The matrix adj A formed from 
a square matrix A by replacing the (/ ， y)-entry of A by the 
(/, 7 )-cofactor, for all / and j , and then transposing the 
resulting matrix. 

affine combination: A linear combination of vectors (points in 
R n ) in which the sum of the weights involved is 1 . 
affine dependence relation: An equation of the form ciVi + 

- h c p \p = 0 , where the weights Ci,... ,c p are not all 

zero, and C\ H - 1 - c p = 0 . 

affine hull (or affine span) of a set S: The set of all affine 
combinations of points in S, denoted by aff S. 
affinely dependent set: A set {vi,..., v^} in such that there 
are real numbers C\,... ,c p , not all zero, such that Ci + ••• + 

c p = 0 and CiVi H - h c p \ p = 0 

affinely independent set: A set {vi,..., v p } in W 1 that is not 
affinely dependent. 

affine set (or affine subset): A set S of points such that if p and 
q are in S, then (1 — t)p - tq e S for each real number t. 
affine transformation: A mapping r: R n —^ of the form 

T (x) = ^4x + b, with yl an m x matrix and b in R m . 
algebraic multiplicity: The multiplicity of an eigenvalue as a 
root of the characteristic equation, 
angle (between nonzero vectors u and y in R 2 or E 3 ): The angle 
分 between the two directed line segments from the origin to 
the points u and y. Related to the scalar product by 

u-y = ||u|| ||v|| cos ^ 

associative law of multiplication: A(BC) = (AB)C, fox all A, 
B,C. 

attractor (of a dynamical system in M. 2 ): The origin when all 
trajectories tend toward 0. 

augmented matrix: A matrix made up of a coefficient matrix 
for a linear system and one or more columns to the right. 
Each extra column contains the constants from the right side 
of a system with the given coefficient matrix, 
auxiliary equation: A polynomial equation in a variable r, 
created from the coefficients of a homogeneous difference 
equation. 

B 

back-substitution (with matrix notation): The backward phase 
of row reduction of an augmented matrix that transforms an 
echelon matrix into a reduced echelon matrix; used to find 
the solution(s) of a system of linear equations. 


backward phase (of row reduction): The last part of the al¬ 
gorithm that reduces a matrix in echelon form to a reduced 
echelon form. 

band matrix: A matrix whose nonzero entries lie within a band 
along the main diagonal. 

barycentric coordinates (of a point p with respect to an affinely 
independent set S = {vi..., v^；}): The (unique) set of 

weights ci,... ,Ck such that p = c\\i H - h Ck\k and c\ + 

c/c = 1. (Sometimes also called the affine coordinates 
of p with respect to S.) 

basic variable: A variable in a linear system that corresponds 
to a pivot column in the coefficient matrix. 

basis (for a nontrivial subspace // of a vector space V): An 
indexed set 5 = {vi,... ， in K such that: (i) B is sl 
linearly independent set and (ii) the subspace spanned by B 
coincides with H, that is, H = Span {vi, … ， v p }. 

S-coordinates of x: See coordinates of x relative to the basis 

B. 

best approximation: The closest point in a given subspace to a 
given vector. 

bidiagonal matrix: A matrix whose nonzero entries lie on the 
main diagonal and on one diagonal adjacent to the main 
diagonal. 

block diagonal (matrix): A partitioned matrix A = [Aij] such 
that each block Aij is a zero matrix for i ♦ i. 

block matrix: See partitioned matrix. 

block matrix multiplication: The row-column multiplication 
of partitioned matrices as if the block entries were scalars. 

block upper triangular (matrix): A partitioned matrix 
A = [A^ ] such that each block A t j is a zero matrix for 

i > j. 

boundary point of a set 5 in R n : A point p such that every open 
ball in centered at p intersects both S and the complement 
of 5. 

bounded set in R n : A set that is contained in an open ball 
5(0, S) for some 5 > 0. 

0-matrix (for T): A matrix [T]^ for a linear transformation 
T \ V ^ V relative to a basis B for V, with the property 
that [r(x)]g = [T]b\Ab for all x in V. 

c 

Cauchy—Schwarz inequality: |(u,v)| < ||i/|| • \\v\\ for all u, v. 

change of basis: See change-of-coordinates matrix. 


A7 
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change-of-coordinates matrix (from a basis to a basis C): A 
matrix els that transforms ^-coordinate vectors into C- 
coordinate vectors: [x] C = c^Wb. If C is the standard 
basis for R n , then C £. B is sometimes written as P^. 
characteristic equation (of ^4): det(^4 — XI) = 0. 
characteristic polynomial (of ^4): det(^4 — XI) or, in some 
texts, det(A/ — A). 

Cholesky factorization: A factorization A = R T R, where R is 
an invertible upper triangular matrix whose diagonal entries 
are all positive. 

closed ball (in R ”）： A set {x : ||x — p|| < in R ”， where p is 
in W 1 and <5 > 0. 

closed set (in IR n ): A set that contains all of its boundary points, 
codomain (of a transformation T: — R m ) : The set R m that 

contains the range of T. In general, if T maps a vector space 
V into a vector space W, then W is called the codomain of 
T. 

coefficient matrix: A matrix whose entries are the coefficients 
of a system of linear equations. 

cofactor: A number Cij = (― 1) ,+ ) det Aij , called the (/, j)- 
cofactor of A, where is the submatrix formed by deleting 
the /th row and the yth column of A. 
cofactor expansion: A formula for det A using cofactors asso¬ 
ciated with one row or one column, such as for row 1: 


composition of linear transformations : A mapping produced 
by applying two or more linear transformations in succes¬ 
sion. If the transformations are matrix transformations, say 
left-multiplication by B followed by left-multiplication by 
A, then the composition is the mapping x i-^ A(Bx). 
condition number (of A): The quotient <J\/a n , where G\ is the 
largest singular value of A and o n is the smallest singular 
value. The condition number is +oo when a n is zero, 
conformable for block multiplication: Two partitioned matri¬ 
ces A and B such that the block product AB is defined: The 
column partition of A must match the row partition of B. 
consistent linear system: A linear system with at least one 
solution. 

constrained optimization: The problem of maximizing a quan¬ 
tity such as x^x or ||^4x|| when x is subject to one or more 
constraints, such as x T x = 1 or x r y = 0. 
consumption matrix: A matrix in the Leontief input-output 
model whose columns are the unit consumption vectors for 
the various sectors of an economy, 
contraction: A mapping x i-^- rx for some scalar r, with 
0 < r < 1. 

controllable (pair of matrices): A matrix pair (A, B) where A 
is n x n, B has n rows, and 

rank [召 AB A 2 B ••• A n ~ l B ] = n 


det A = ci\\C\\ + ... + a i n C\ n 

column-row expansion: The expression of a product AB 
as a sum of outer products: coli (A) rowi (5) + ••- + 
col„ {A) row n (5), where n is the number of columns of A. 
column space (of an m x « matrix ^4): The set Col A of all 
linear combinations of the columns of A. If ^4 = [ai •. • a„], 
then Col A = Span {a^ ..., a„}. Equivalently, 

Col ^4 = {y : y = ylx for some x in R n } 


column sum: The sum of the entries in a column of a matrix, 
column vector: A matrix with only one column, or a single 
column of a matrix that has several columns, 
commuting matrices: Two matrices A and B such that 
AB = BA. 

compact set (in R ”）： A set in E ,? that is both closed and 
bounded. 


companion matrix: A special form of matrix whose charac¬ 
teristic polynomial is (—\) n p{X) when p(X) is a specified 
polynomial whose leading term is X n . 
complex eigenvalue: A nonreal root of the characteristic equa¬ 
tion of an n x « matrix. 


complex eigenvector: A nonzero vector x in C n such that 
Ax = Ax, where A is an n x n matrix and A is a complex 
eigenvalue. 


component of y orthogonal to u (for u _ 0): 



u-u 


The vector 


Related to a state-space model of a control system and the 
difference equation x^+i = Axk + (A: = 0,1,...). 

convergent (sequence of vectors): A sequence {x^：} such that 
the entries in can be made as close as desired to the entries 
in some fixed vector for all k sufficiently large, 
convex combination (of points Vi, ...,in M ”）： A linear 
combination of vectors (points) in which the weights in the 
combination are nonnegative and the sum of the weights 
is 1. 

convex hull (of a set S): The set of all convex combinations of 
points in S , denoted by: conv S. 

convex set: A set S with the property that for each p and q in 
S, the line segment pq is contained in S. 
coordinate mapping (determined by an ordered basis 13 in a 
vector space V): A mapping that associates to each x in 
V its coordinate vector [x]g. 

coordinates of x relative to the basis B = (bi,..., b M J: The 

weights ci,... ,c n in the equation x = Cibi +- h c n b n . 

coordinate vector of x relative to B: The vector [x]g whose 
entries are the coordinates of x relative to the basis B. 
covariance (of variables Xi and Xj , for i ^ j): The entry Sij in 
the covariance matrix S for a matrix of observations, where 
Xi and Xj vary over the / th and y th coordinates, respectively, 
of the observation vectors. 

covariance matrix (or sample covariance matrix): The p 乂 p 

matrix S defined by S = (N — , where 5 is a 

p x N matrix of observations in mean-deviation form. 
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Cramer’s rule: A formula for each entry in the solution x of 
the equation Ax = b when A is an invertible matrix, 
cross-product term: A term cxtXj in a quadratic form, with 
z 7 ^ 7 - 

cube: A three-dimensional solid object bounded by six square 
faces, with three faces metting at each vertex. 

D 

decoupled system: A difference equation y k ^-i = Ay k , or a 
differential equation y f (t) = Ay{t), in which ^4 is a diagonal 
matrix. The discrete evolution of each entry in (as a 
function of k), or the continuous evolution of each entry 
in the vector-valued function y(t), is unaffected by what 
happens to the other entries as — oo or f — oo. 
design matrix: The matrix X in the linear model y = Xp + €, 
where the columns of X are determined in some way by the 
observed values of some independent variables, 
determinant (of a square matrix ^4): The number det A defined 
inductively by a cofactor expansion along the first row of A. 
Also, (—l) r times the product of the diagonal entries in any 
echelon form U obtained from A by row replacements and 
r row interchanges (but no scaling operations), 
diagonal entries (in a matrix): Entries having equal row and 
column indices. 

diagonalizable (matrix): A matrix that can be written in fac¬ 
tored form as PDP— 1 , where Z) is a diagonal matrix and P 
is an invertible matrix. 

diagonal matrix: A square matrix whose entries not on the 
main diagonal are all zero. 

difference equation (or linear recurrence relation): An equa¬ 
tion of the form x&+i = Axk (k = 0,1,2,...) whose solu¬ 
tion is a sequence of vectors, Xq, Xi,.... 
dilation: A mapping x\-^ rx for some scalar r, with 1 < r. 

dimension: 

of a flat S: The dimension of the corresponding parallel 
subspace. 

of a set 5: The dimension of the smallest flat containing S. 
of a subspace S: The number of vectors in a basis for S, 
written as dim S. 

of a vector space V : The number of vectors in a basis for V, 
written as dim V. The dimension of the zero space is 0. 
discrete linear dynamical system: A difference equation of the 
form xic-\-i = Axk that describes the changes in a system 
(usually a physical system) as time passes. The physical 
system is measured at discrete times, when k = 0,1,2,, 
and the state of the system at time A ： is a vector x/t whose 
entries provide certain facts of interest about the system, 
distance between u and y ： The length of the vector u — y, 
denoted by dist (u, y). 

distance to a subspace: The distance from a given point (vec¬ 
tor) y to the nearest point in the subspace, 
distributive laws: (left) A(B + C) = AB + AC, and (right) 
(B + C)A = BACA, for all A, B, C. 


domain (of a transformation T): The set of all vectors x for 
which T (x) is defined, 
dot product: See inner product. 

dynamical system: See discrete linear dynamical system. 

E 

echelon form (or row echelon form, of a matrix): An echelon 
matrix that is row equivalent to the given matrix, 
echelon matrix (or row echelon matrix): A rectangular matrix 
that has three properties: (1) All nonzero rows are above 
any row of all zeros. (2) Each leading entry of a row is in 
a column to the right of the leading entry of the row above 
it. (3) All entries in a column below a leading entry are zero, 
eigenfunctions (of a differential equation x’（0 = Ax(t)): A 
function x(t) = \e Xt , where y is an eigenvector of A and A 
is the corresponding eigenvalue. 

eigenspace (of A corresponding to A): The set of all solutions 
of Ax = Ax, where A is an eigenvalue of A. Consists of the 
zero vector and all eigenvectors corresponding to A. 
eigenvalue (of A): A scalar A such that the equation Ax = Ax 
has a solution for some nonzero vector x. 
eigenvector (of ^4): A nonzero vector x such that Ax = Ax for 
some scalar A. 

eigenvector basis: A basis consisting entirely of eigenvectors 
of a given matrix. 

eigenvector decomposition (of x): An equation, x = qvi + 
••• + c n \ n , expressing x as a linear combination of eigen¬ 
vectors of a matrix. 

elementary matrix: An invertible matrix that results by per¬ 
forming one elementary row operation on an identity matrix, 
elementary row operations: (1) (Replacement) Replace one 
row by the sum of itself and a multiple of another row. (2) 
Interchange two rows. (3) (Scaling) Multiply all entries in a 
row by a nonzero constant. 

equal vectors: Vectors in R” whose corresponding entries are 
the same. 

equilibrium prices: A set of prices for the total output of the 
various sectors in an economy, such that the income of each 
sector exactly balances its expenses, 
equilibrium vector: See steady-state vector, 
equivalent (linear) systems: Linear systems with the same 
solution set. 

exchange model: See Leontief exchange model, 
existence question: Asks, “Does a solution to the system ex¬ 
ist?^ That is, “Is the system consistent?” Also, “Does a 
solution of Ax = b exist for all possible b?” 
expansion by cofactors: See cofactor expansion, 
explicit description (of a subspace W of R n ): A parametric 
representation of W as the set of all linear combinations of 
a set of specified vectors. 

extreme point (of a convex set S): A point p in 5 such that p is 
not in the interior of any line segment that lies in S. (That is, 
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if x, y are in S and p is on the line segment xy, then p = x 
or p = y.) 


factorization (of ^4): An equation that expresses ^ as a product 
of two or more matrices. 

final demand vector (or bill of final demands): The vector d 
in the Leontief input-output model that lists the dollar values 
of the goods and services demanded from the various sectors 
by the nonproductive part of the economy. The vector d 
can represent consumer demand, government consumption, 
surplus production, exports, or other external demand, 
finite-dimensional (vector space): A vector space that is 
spanned by a finite set of vectors, 
flat (in R w ) : A translate of a subspace of R n . 
flexibility matrix: A matrix whose y th column gives the de¬ 
flections of an elastic beam at specified points when a unit 
force is applied at the yth point on the beam, 
floating point arithmetic: Arithmetic with numbers repre¬ 
sented as decimals 士 .di • •. x 10 r , where r is an integer 
and the number p of digits to the right of the decimal point 
is usually between 8 and 16. 

flop: One arithmetic operation (+ ，一 ，*,/) on two real floating 
point numbers. 

forward phase (of row reduction): The first part of the algo¬ 
rithm that reduces a matrix to echelon form. 

Fourier approximation (of order n)\ The closest point in the 
subspace of nth-order trigonometric polynomials to a given 
function in C [0, 2tv]. 

Fourier coefficients: The weights used to make a trigonometric 
polynomial as a Fourier approximation to a function. 
Fourier series: An infinite series that converges to a function 
in the inner product space C[0, 2 丌 ], with the inner product 
given by a definite integral. 

free variable: Any variable in a linear system that is not a basic 
variable. 

full rank (matrix): An ?n x n matrix whose rank is the smaller 
of m and n. 

fundamental set of solutions: A basis for the set of all solutions 
of a homogeneous linear difference or differential equation, 
fundamental subspaces (determined by ^4): The null space and 
column space of A, and the null space and column space of 
A t , with Col A T commonly called the row space of A. 

G 

Gaussian elimination: See row reduction algorithm, 
general least-squares problem: Given an m 乂 n matrix 
A and a vector b in R m , find x in R” such that 
||b — i4x|| < ||b — Ax\\ for all x in R n . 
general solution (of a linear system): A parametric description 
of a solution set that expresses the basic variables in terms of 


the free variables (the parameters), if any. After Section 1.5, 
the parametric description is written in vector form. 

Givens rotation: A linear transformation from to used in 
computer programs to create zero entries in a vector (usually 
a column of a matrix). 

Gram matrix (of ^4): The matrix A T A. 

Gram-Schmidt process: An algorithm for producing an or¬ 
thogonal or orthonormal basis for a subspace that is spanned 
by a given set of vectors. 

H 

homogeneous coordinates: In R 3 ，the representation of 
(x, y, z) as (X, Y, Z, H) for any // _ 0, where x = X/H, 
y = Y/H, and z = Z/H. In R 2 , H is usually taken as 1, 
and the homogeneous coordinates of (x, y) are written as 
(x,y, 1). 

homogeneous equation: An equation of the form Ax = 0, pos¬ 
sibly written as a vector equation or as a system of linear 
equations. 

~ y 

homogeneous form of (a vector) y in R w : The point y = ^ 

mE" +1 . 

Householder reflection: A transformation x Qx, where 
Q = I — 2uu t and u is a unit vector (u r u =1). 

hyperplane (in E n ): A flat in E n of dimension n — l. Also: a 
translate of a subspace of dimension n — 1. 


identity matrix (denoted by / or /„): A square matrix with ones 
on the diagonal and zeros elsewhere, 
ill-conditioned matrix: A square matrix with a large (or pos¬ 
sibly infinite) condition number; a matrix that is singular or 
can become singular if some of its entries are changed ever 
so slightly. 

image (of a vector x under a transformation T): The vector T (x) 
assigned to x by T. 

implicit description (of a subspace W of R n ): A set of one 
or more homogeneous equations that characterize the points 
of W. 

Im x: The vector in W 1 formed from the imaginary parts of the 
entries of a vector x in C n . 

inconsistent linear system: A linear system with no solution, 
indefinite matrix: A symmetric matrix A such that x T Ax as¬ 
sumes both positive and negative values, 
indefinite quadratic form: A quadratic form Q such that Q(x) 
assumes both positive and negative values, 
infinite-dimensional (vector space): A nonzero vector space V 
that has no finite basis. 

inner product: The scalar u r y, usually written as u-v, where 
u and v are vectors in viewed as « x 1 matrices. Also 
called the dot product of u and y. In general, a function on 
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a vector space that assigns to each pair of vectors u and y a 
number (u, y), subject to certain axioms. See Section 6.7. 

inner product space: A vector space on which is defined an 
inner product. 

input-output matrix: See consumption matrix. 

input-output model: See Leontief input-output model. 

interior point (of a set S in R n ): A point p in 5 such that for 
some <5 > 0, the open ball B(p, 5) centered at p is contained 
in S. 

intermediate demands: Demands for goods or services that 
will be consumed in the process of producing other goods 
and services for consumers. If x is the production level and 
C is the consumption matrix, then Cx lists the intermediate 
demands. 

interpolating polynomial: A polynomial whose graph passes 
through every point in a set of data points in M 2 . 

invariant subspace (for A): A subspace H such that Ax is in 
H whenever x is in H • 

inverse (of ann x n matrix ^4): An n x n matrix A~ l such that 
AA~ l = A~ l A = I n . 

inverse power method: An algorithm for estimating an eigen¬ 
value A of a square matrix, when a good initial estimate of A 
is available. 

invertible linear transformation: A linear transformation 
r: — R n such that there exists a function • R” 

satisfying both 7(5* (x)) = x and S(T(x)) = x for all x in 
R n . 

invertible matrix: A square matrix that possesses an inverse. 

isomorphic vector spaces: Two vector spaces V and W for 
which there is a one-to-one linear transformation T that maps 

V onto W. 

isomorphism: A one-to-one linear mapping from one vector 
space onto another. 

K 

kernel (of a linear transformation r: F —> W): The set of x in 

V such that T(x) = 0. 

Kirchhoff’s laws: (1) (voltage law) The algebraic sum of the 
RI voltage drops in one direction around a loop equals the 
algebraic sum of the voltage sources in the same direction 
around the loop. (2) (current law) The current in a branch 
is the algebraic sum of the loop currents flowing through that 
branch. 


ladder network: An electrical network assembled by connect¬ 
ing in series two or more electrical circuits, 
leading entry: The leftmost nonzero entry in a row of a matrix, 
least-squares error: The distance ||b — ^4x|| from b to Ax, 
when x is a least-squares solution of Ax = b. 
least-squares line: The line y = po Pix that minimizes the 
least-squares error in the equation y = XP 


least-squares solution (of Ax = b): A vector x such that 
||b — Ax\\ < ||b — Ax\\ for all x in R n . 

left inverse (of ^4): Any rectangular matrix C such that 
CA = /. 

left-multiplication (by ^4): Multiplication of a vector or matrix 
on the left by A. 

left singular vectors (of A): The columns of U in the singular 
value decomposition A = UTjV t . 

length (or norm, of v): The scalar ||y|| = vV. v = vV， v 〉. 

Leontief exchange (or closed) model: A model of an economy 
where inputs and outputs are fixed, and where a set of prices 
for the outputs of the sectors is sought such that the income 
of each sector equals its expenditures. This “equilibrium” 
condition is expressed as a system of linear equations, with 
the prices as the unknowns. 

Leontief input-output model (or Leontief production equa¬ 
tion): The equation x = Cx + d, where x is production, d 
is final demand, and C is the consumption (or input-output) 
matrix. The yth column of C lists the inputs that sector j 
consumes per unit of output. 

level set (or gradient) of a linear functional / on R w : A set 
[/:^={x6R":/(x)=rf} 

linear combination: A sum of scalar multiples of vectors. The 
scalars are called the weights. 

linear dependence relation: A homogeneous vector equation 
where the weights are all specified and at least one weight is 
nonzero. 

linear equation (in the variables X\,..., x n ): An equation that 
can be written in the form a\X\ + a 2 X 2 + • • • + a n x n = b, 
where b and the coefficients a\,... ,a n are real or complex 
numbers. 

linear filter: A linear difference equation used to transform 
discrete-time signals. 

linear functional (on E w ): A linear transformation / from R” 
into M. 

linearly dependent (vectors): An indexed set {y l5 ..., y p } with 
the property that there exist weights Ci,..., c p , not all zero, 
such that CiVi + • •. + c p \ p = 0. That is, the vector equa¬ 
tion Ci Vi + C 2 V 2 + • • • + c p \ p = 0 has a nontrivial solution. 

linearly independent (vectors): An indexed set {vi, 

with the property that the vector equation CiYi + 
C 2\2 + … + c p \p = 0 has only the trivial solution, 
C\ = • • • = c p = 0. 

linear model (in statistics): Any equation of the form 
y = Xp where X and y are known and P is to be chosen 

to minimize the length of the residual vector, €. 

linear system: A collection of one or more linear equations 
involving the same variables, say, X\,... ,x n . 

linear transformation T (from a vector space V into a vec¬ 
tor space W)\ A rule T that assigns to each vector 
x in F a unique vector T(x) in W, such that (i) 
r(u + v) = T(u) + r(v) for all u,v in F, and (ii) 
T(cu) = cT(u) for all u in K and all scalars c. Notation: 
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T : V ^ W; also, x\-^ Ax when r: R n —^ and A is the 

standard matrix for T. 

line through p parallel to y ： The set {p + rv : Mn E}. 

loop current: The amount of electric current flowing through a 
loop that makes the algebraic sum of the RI voltage drops 
around the loop equal to the algebraic sum of the voltage 
sources in the loop. 

lower triangular matrix: A matrix with zeros above the main 
diagonal. 

lower triangular part (of A): A lower triangular matrix whose 
entries on the main diagonal and below agree with those 
in A. 

LU factorization: The representation of a matrix A in the form 
A = LU where L is a square lower triangular matrix with 
ones on the diagonal (a unit lower triangular matrix) and U 
is an echelon form of A. 

M 

magnitude (of a vector): See norm. 

main diagonal (of a matrix): The entries with equal row and 
column indices. 

mapping: See transformation. 

Markov chain: A sequence of probability vectors Xq, Xi, 
X 2 ,..., together with a stochastic matrix P such that 
x 众 +1 = Pxk for k = 0 , 1 , 2 ,_ 

matrix: A rectangular array of numbers. 

matrix equation: An equation that involves at least one matrix; 
for instance, Ax = b. 

matrix for T relative to bases 15 and C: A matrix M for 
a linear transformation T: V ^ W with the property that 
[T (x)] c = M[x] b for all xin V, where jSisa basis for V and 
C is a basis for W. When W = V and C = B, the matrix M 
is called the 忍 -matrix for T and is denoted by [7"]^. 

matrix of observations: A p x N matrix whose columns are 
observation vectors, each column listing p measurements 
made on an individual or object in a specified population 
or set. 

matrix transformation : A mapping x i-^ Ax, where A is an 
m x n matrix and x represents any vector in W 1 . 

maximal linearly independent set (in V): A linearly indepen¬ 
dent set B in V such that if a vector y in K but not in B is 
added to B, then the new set is linearly dependent. 

mean-deviation form (of a matrix of observations): A matrix 
whose row vectors are in mean-deviation form. For each 
row, the entries sum to zero. 

mean-deviation form (of a vector): A vector whose entries sum 
to zero. 

mean square error: The error of an approximation in an inner 
product space, where the inner product is defined by a defi¬ 
nite integral. 


migration matrix: A matrix that gives the percentage move¬ 
ment between different locations, from one period to the 
next. 

minimal spanning set (for a subspace H): A set B that spans 
H and has the property that if one of the elements of B is 
removed from B, then the new set does not span H. 
m x n matrix: A matrix with m rows and n columns. 
Moore-Penrose inverse: See pseudoinverse, 
multiple regression: A linear model involving several indepen¬ 
dent variables and one dependent variable. 

N 

nearly singular matrix: An ill-conditioned matrix, 
negative definite matrix: A symmetric matrix A such that 
x^x < 0 for all x ^ 0. 

negative definite quadratic form: A quadratic form Q such 
that Q(x) < 0 for all x _ 0. 

negative semidefinite matrix: A symmetric matrix A such that 
x T Ax < 0 for all x. 

negative semidefinite quadratic form: A quadratic form Q 
such that Q(x) < 0 for all x. 

nonhomogeneous equation: An equation of the form ^4x = b 
with b _ 0, possibly written as a vector equation or as a 
system of linear equations, 
nonsingular (matrix): An invertible matrix, 
nontrivial solution: A nonzero solution of a homogeneous 
equation or system of homogeneous equations, 
nonzero (matrix or vector): A matrix (with possibly only one 
row or column) that contains at least one nonzero entry, 
norm (or length, of y): The scalar ||y|| = ^/v-v = y^v, v〉. 
normal equations: The system of equations represented by 
A t Ax = A T b, whose solution yields all least-squares so¬ 
lutions of Ax = b. In statistics, a common notation is 
X T Xp = X T y. 

normalizing (a nonzero vector y): The process of creating a unit 
vector u that is a positive multiple of v. 
normal vector (to a subspace V of R w ): A vector n in R” such 
that n*x = 0 for all x in V. 

null space (of anm x n matrix A)\ The set Nul A of all solutions 
to the homogeneous equation Ax = 0. Nul ^4 = {x : x is in 
and Ax = 0}. 

o 

observation vector: The vector y in the linear model 
y = Xp where the entries in y are the observed values 
of a dependent variable. 

one-to-one (mapping): A mapping T: W 1 ^ R m such that 
each b in R m is the image of at most one x in R”. 
onto (mapping): A mapping r: R” — R m such that each b in 
R m is the image of at least one x in M” • 
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open ball B(p, 5) in R n : The set {x : ||x — p|| < 5} in R n , where 
8 > 0 . 

open set S in A set that contains none of its boundary 
points. (Equivalently, S is open if every point of S is an 
interior point.) 

origin: The zero vector. 

orthogonal basis: A basis that is also an orthogonal set. 

orthogonal complement (of W): The set W^- of all vectors 
orthogonal to W. 

orthogonal decomposition: The representation of a vector y 
as the sum of two vectors, one in a specified subspace 
W and the other in . In general, a decomposition 

y = ciUi +- h CpU p , where {ui,.. ., u p } is an orthogonal 

basis for a subspace that contains y. 

orthogonally diagonalizable (matrix): A matrix A that admits 
a factorization, A = PDP~ l , with P an orthogonal matrix 
(P~ l = P T ) and D diagonal. 

orthogonal matrix: A square invertible matrix LJ such that 
U- l = U T . 

orthogonal projection of y onto u (or onto the line through u and 

y *u 

the origin, for u _ 0): The vector y defined by y = - u. 

u*u 

orthogonal projection of y onto W: The unique vector y in 
such that y — y is orthogonal to W. Notation: y = proj^ y. 

orthogonal set: A set S of vectors such that u*v = 0 for each 
distinct pair u, v in S. 

orthogonal to W: Orthogonal to every vector in W. 

orthonormal basis: A basis that is an orthogonal set of unit 
vectors. 

orthonormal set: An orthogonal set of unit vectors. 

outer product: A matrix product uv r where u and y are vectors 
in viewed as « x 1 matrices. (The transpose symbol is on 
the “outside” of the symbols u and y.) 

overdetermined system: A system of equations with more 
equations than unknowns. 

p 

parallel fiats: Two or more flats such that each flat is a translate 
of the other flats. 

parallelogram rule for addition: A geometric interpretation of 
the sum of two vectors u, v as the diagonal of the parallelo¬ 
gram determined by u, v, and 0. 

parameter vector: The unknown vector p in the linear model 

y = Xp + e. 

parametric equation of a line: An equation of the form 
x = p 1\ (t in R). 

parametric equation of a plane: An equation of the form 
x = p + 5U + ( 5 , t in R), with u and y linearly 

independent. 

partitioned matrix (or block matrix): A matrix whose entries 
are themselves matrices of appropriate sizes. 


permuted lower triangular matrix: A matrix such that a per¬ 
mutation of its rows will form a lower triangular matrix. 

permuted LU factorization: The representation of a matrix A 
in the form A = LU where L is a square matrix such that 
a permutation of its rows will form a unit lower triangular 
matrix, and U is an echelon form of A. 

pivot: A nonzero number that either is used in a pivot position 
to create zeros through row operations or is changed into a 
leading 1, which in turn is used to create zeros. 

pivot column: A column that contains a pivot position. 

pivot position: A position in a matrix A that corresponds to a 
leading entry in an echelon form of A. 

plane through u, y, and the origin: A set whose parametric 
equation is x = 5 U + ( 5 , t in R), with u and y linearly 

independent. 

polar decomposition (of ^4): A factorization A = PQ, where 
P is an n x n positive semidefinite matrix with the same rank 
as A, and Q is an n x n orthogonal matrix. 

polygon: A polytope in E 2 . 

polyhedron: A poly tope in R 3 . 

polytope: The convex hull of a finite set of points in R n (a 
special type of compact convex set). 

positive combination (of points Vi,..., y m in W 1 ): A linear 
combination ciVi H - + c m \ m , where all q > 0 . 

positive definite matrix: A symmetric matrix A such that 
x^x > 0 for all x _ 0. 

positive definite quadratic form: A quadratic form Q such 
that Q(x) > 0 for all x _ 0. 

positive hull (of a set S): The set of all positive combinations 
of points in S, denoted by pos S. 

positive semidefinite matrix: A symmetric matrix A such that 
x^x > 0 for all x. 

positive semidefinite quadratic form: A quadratic form Q 
such that Q(x) > 0 for all x. 

power method: An algorithm for estimating a strictly dominant 
eigenvalue of a square matrix. 

principal axes (of a quadratic form x^x): The orthonormal 
columns of an orthogonal matrix P such that P~ l AP is 
diagonal. (These columns are unit eigenvectors of ^4.) Usu¬ 
ally the columns of P are ordered in such a way that the 
corresponding eigenvalues of A are arranged in decreasing 
order of magnitude. 

principal components (of the data in a matrix B of 
observations): The unit eigenvectors of a sample co- 
variance matrix S for B, with the eigenvectors arranged 
so that the corresponding eigenvalues of S decrease in 
magnitude. If B is in mean-deviation form, then the 
principal components are the right singular vectors in a 
singular value decomposition of B T . 

probability vector: A vector in R” whose entries are nonnega¬ 
tive and sum to one. 
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product Ax: The linear combination of the columns of A using 
the corresponding entries in x as weights, 
production vector: The vector in the Leontief input-output 
model that lists the amounts that are to be produced by the 
various sectors of an economy, 
profile (of a set S in R ”）： The set of extreme points of S. 
projection matrix (or orthogonal projection matrix): A sym¬ 
metric matrix B such that B 2 = B. A simple example is 
B = \\ T , where y is a unit vector, 
proper subset of a set S: A subset of S that does not equal S 
itself. 

proper subspace: Any subspace of a vector space V other than 
V itself. 

pseudoinverse (of 乂)： The matrix VD~ l U T , when UDV T is a 
reduced singular value decomposition of A. 

Q 

QR factorization: A factorization of an m x n matrix A with 
linearly independent columns, A = QR, where Q is an 
m x n matrix whose columns form an orthonormal basis for 
Col A, and R is an n x n upper triangular invertible matrix 
with positive entries on its diagonal, 
quadratice Bezier curve: A curve whose description may be 
written in the form g(t) = (1 — t)fo(0 + ^i(0 for 0 < ? < 
1, where f 0 (0 = (1 - OPo + r Pi and f i(0 = (1 — OPi + 
tp 2 . The points p 0 , p 1? p 2 are called the control points for 
the curve. 

quadratic form: A function Q defined for x in R n by Q(x)= 
x t Ax, where A is an n x n symmetric matrix (called the 

matrix of the quadratic form). 

R 

range (of a linear transformation T): The set of all vectors of 
the form T (x) for some x in the domain of T. 
rank (of a matrix ^4): The dimension of the column space of A, 
denoted by rank 儿 

Rayleigh quotient: R(x) = (x T Ax)/(x T x). An estimate of an 
eigenvalue of A (usually a symmetric matrix), 
recurrence relation: See difference equation. 

reduced echelon form (or reduced row echelon form): A 

reduced echelon matrix that is row equivalent to a given 
matrix. 

reduced echelon matrix: A rectangular matrix in echelon form 
that has these additional properties: The leading entry in 
each nonzero row is 1, and each leading 1 is the only nonzero 
entry in its column. 

reduced singular value decomposition: A factorization 
A = UDV T , for an m x « matrix A of rankr, where U is 
m x r with orthonormal columns, D is an r x r diagonal 
matrix with the r nonzero singular values of A on its 
diagonal, and V is n x r with orthonormal columns. 


regression coefficients: The coefficients 卢 o and 卢 i in the least- 
squares line y = p 0 -p x x. 

regular solid: One of the five possible regular polyhedrons in 
R 3 : the tetrahedron (4 equal triangular faces), the cube (6 
square faces), the octahedron (8 equal triangular faces), the 
dodecahedron (12 equal pentagonal faces), and the icosahe¬ 
dron (20 equal triangular faces). 

regular stochastic matrix: A stochastic matrix P such that 
some matrix power P k contains only strictly positive entries. 

relative change or relative error (in b): The quantity 
|| Ab || / 1| b || when b is changed to b + Ab. 

repellor (of a dynamical system in R 2 ): The origin when all 
trajectories except the constant zero sequence or function 
tend away from 0. 

residual vector: The quantity € that appears in the general 
linear model: y = Xp -\- €； that is, € = y — Xp, the differ¬ 
ence between the observed values and the predicted values 
(of y). 

Rex: The vector in W l formed from the real parts of the entries 
of a vector x in C n . 

right inverse (of ^4): Any rectangular matrix C such that 
AC = I. 

right-multiplication (by ^4): Multiplication of a matrix on the 
right by A. 

right singular vectors (of A)\ The columns of V in the singular 
value decomposition A = LTEV T . 

roundoff error: Error in floating point arithmetic caused when 
the result of a calculation is rounded (or truncated) to the 
number of floating point digits stored. Also, the error that 
results when the decimal representation of a number such as 
1/3 is approximated by a floating point number with a finite 
number of digits. 

row-column rule: The rule for computing a product AB in 
which the (/, 7 )-entry of AB is the sum of the products of 
corresponding entries from row i of A and column j of B. 

row equivalent (matrices): Two matrices for which there exists 
a (finite) sequence of row operations that transforms one 
matrix into the other. 

row reduction algorithm: A systematic method using elemen¬ 
tary row operations that reduces a matrix to echelon form or 
reduced echelon form. 

row replacement: An elementary row operation that replaces 
one row of a matrix by the sum of the row and a multiple of 
another row. 

row space (of a matrix ^4): The set Row A of all linear combina¬ 
tions of the vectors formed from the rows of A ; also denoted 
by Col A T . 

row sum: The sum of the entries in a row of a matrix. 

row vector: A matrix with only one row, or a single row of a 
matrix that has several rows. 

row-vector rule for computing Ax: The rule for computing a 
product Ax in which the ith entry of Ax is the sum of the 
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products of corresponding entries from row i of A and from 
the vector x. 

s 

saddle point (of a dynamical system in R 2 ): The origin when 
some trajectories are attracted to 0 and other trajectories are 
repelled from 0. 

same direction (as a vector y): A vector that is a positive 
multiple of v. 

sample mean: The average M of a set of vectors, X], ... ,X^, 
given by M = (l/iV)(Xi + ... + X N ). 

scalar: A (real) number used to multiply either a vector or a 
matrix. 

scalar multiple of u by c: The vector cu obtained by multiply¬ 
ing each entry in u by c. 

scale (a vector): Multiply a vector (or a row or column of a 
matrix) by a nonzero scalar. 

Schur complement: A certain matrix formed from the blocks 
of a 2 x 2 partitioned matrix A = [Aij]. If An is invert¬ 
ible, its Schur complement is given by A 22 — A 21 A ]] 1 An. 
If A 22 is invertible, its Schur complement is given by 
— ^ 12^22 -^- 21 - 

Schur factorization (of A, for real scalars): A factorization 
A = URU T of an n x n matrix A having n real eigenvalues, 
where U is an « x n orthogonal matrix and R is an upper 
triangular matrix. 

set spanned by |vi,..., v^J: The set Span {y l9 ... ,v p }. 

signal (or discrete-time signal): A doubly infinite sequence of 
numbers, {^}; a function defined on the integers; belongs to 
the vector space S. 

similar (matrices): Matrices A and B such that P~ l AP = B, 
or equivalently, A = PBP~ l , for some invertible matrix P. 

similarity transformation : A transformation that changes A 
into P~ l AP. 

simplex: The convex hull of an affinely independent finite set 
of vectors in R n . 

singular (matrix): A square matrix that has no inverse. 

singular value decomposition (of an m x n matrix ^4): A = 
LTEV 7 , where U is an m x m orthogonal matrix, V is an 
n x n orthogonal matrix, and S is an m x n matrix with non¬ 
negative entries on the main diagonal (arranged in decreas¬ 
ing order of magnitude) and zeros elsewhere. If rank A = r, 
then S has exactly r positive entries (the nonzero singular 
values of A) on the diagonal. 

singular values (of ^4): The (positive) square roots of the eigen¬ 
values of A T A, arranged in decreasing order of magnitude. 

size (of a matrix): Two numbers, written in the form m x n, 
that specify the number of rows (m) and columns in) in the 
matrix. 

solution (of a linear system involving variables Xi,, x n ): A 
list (h ， h，... ’ h) of numbers that makes each equation in 


the system a true statement when the values si,... ,s n are 
substituted for Xi,... ,x n , respectively, 
solution set: The set of all possible solutions of a linear system. 
The solution set is empty when the linear system is inconsis¬ 
tent. 

Span {vi” " ， VpJ: The set of all linear combinations of 
Vi,... ,\ p . Also, the subspace spanned (or generated) by 

v 1 ， . .. ,\p. 

spanning set (for a subspace H): Any set {vi,...,y^} in H 
such that H = Span {vi,... ,v p }. 
spectral decomposition (of A): A representation 

A = AiUiuf -\ - f- X n \x n \x T n 

where {ui,... ， u„} is an orthonormal basis of eigenvectors 
of A, and Xi,... ,X n are the corresponding eigenvalues of A. 
spiral point (of a dynamical system in M 2 ): The origin when 
the trajectories spiral about 0. 

stage-matrix model: A difference equation x 众 +1 = Axk where 
X/c lists the number of females in a population at time k, 
with the females classified by various stages of development 
(such as juvenile, subadult, and adult), 
standard basis: The basis 5 = {ei,..., e„ } for consisting 
of the columns of the n x n identity matrix, or the basis 
{1， …， ， n } for IP”. 

standard matrix (for a linear transformation T): The matrix A 
such that T (x) = Ax for all x in the domain of T. 
standard position: The position of the graph of an equation 
x T Ax =c, when ^4 is a diagonal matrix, 
state vector: A probability vector. In general, a vector that de¬ 
scribes the “state” of a physical system, often in connection 
with a difference equation x/：+i = Ax^. 
steady-state vector (for a stochastic matrix P): A probability 
vector q such that Pq = q. 

stiffness matrix: The inverse of a flexibility matrix. The yth 
column of a stiffness matrix gives the loads that must be 
applied at specified points on an elastic beam in order to 
produce a unit deflection at the y th point on the beam, 
stochastic matrix: A square matrix whose columns are proba¬ 
bility vectors. 

strictly dominant eigenvalue: An eigenvalue X\ of a matrix A 
with the property that |Ai| > |Aa ：| for all other eigenvalues 
X/c of A. 

submatrix (of A): Any matrix obtained by deleting some rows 
and/or columns of A; also, A itself, 
subspace: A subset H of some vector space V such that H has 
these properties: (1) the zero vector of F is in //; (2) H 
is closed under vector addition; and (3) H is closed under 
multiplication by scalars. 

supporting hyperplane (to a compact convex set S in M. n ): A 
hyperplane H = [f : d] such that // fl 一 0 and either 
f(x) < d for all x in 5 or f{x) > d for all x in S. 
symmetric matrix: A matrix A such that A T = A. 
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system of linear equations (or a linear system): A collection 
of one or more linear equations involving the same set of 
variables, say, x\,... ,x n . 


tetrahedron: A three-dimensional solid object bounded by four 
equal triangular faces, with three faces meeting at each 
vertex. 

total variance: The trace of the covariance matrix 5 of a matrix 
of observations. 

trace (of a square matrix A): The sum of the diagonal entries in 
A, denoted by tr A. 

trajectory: The graph of a solution {x。, x l5 X 2 , ...} of a dynam¬ 
ical system Xk+i = Ax/c, often connected by a thin curve to 
make the trajectory easier to see. Also, the graph of x(t) 
for t > 0, when x(?) is a solution of a differential equation 

x ， (r) = 4x(0. 

transfer matrix: A matrix A associated with an electrical cir¬ 
cuit having input and output terminals, such that the output 
vector is A times the input vector. 

transformation (or function, or mapping) T from to 
R m : A rule that assigns to each vector x in R n a unique 
vector r(x) in E ,n . Notation: T : W l —^ R m . Also, 
T \ V ^ W denotes a rule that assigns to each x in F a 
unique vector T (x) in W. 

translation (by a vector p): The operation of adding p to a 
vector or to each vector in a given set. 

transpose (of ^4): An n 乂 m matrix A T whose columns are the 
corresponding rows of the m x n matrix A. 

trend analysis: The use of orthogonal polynomials to fit data, 
with the inner product given by evaluation at a finite set of 
points. 

triangle inequality: ||u + v|| < ||u|| + ||v|| for all u, y. 

triangular matrix: A matrix A with either zeros above or zeros 
below the diagonal entries. 

trigonometric polynomial: A linear combination of the con¬ 
stant function 1 and sine and cosine functions such as cos nt 
and sinnt. 

trivial solution: The solution x = 0 of a homogeneous equation 
Ax = 0. 

u 

uncorrelated variables: Any two variables x z - and Xj (with 
i ^ j) that range over the zth and 7 th coordinates of the 
observation vectors in an observation matrix, such that the 
covariance Sij is zero. 

underdetermined system: A system of equations with fewer 
equations than unknowns. 

uniqueness question: Asks, “If a solution of a system exists, is 
it unique—that is, is it the only one?” 


unit consumption vector: A column vector in the Leontief 
input-output model that lists the inputs a sector needs for 
each unit of its output; a column of the consumption matrix, 
unit lower triangular matrix: A square lower triangular ma¬ 
trix with ones on the main diagonal, 
unit vector: A vector y such that ||y|| = 1. 
upper triangular matrix: A matrix U (not necessarily square) 
with zeros below the diagonal entries Mn, 1 / 22 ,.… 

v 

Vandermonde matrix: An n x n matrix V or its transpose, 
when V has the form 


Xl 

X l •• 

x n ~ 

X2 

x\ • • 


x n 

X n .• 

.. 


variance (of a variable xj): The diagonal entry Sjj in the covari¬ 
ance matrix S for a matrix of observations, where Xj varies 
over the y th coordinates of the observation vectors, 
vector: A list of numbers; a matrix with only one column. In 
general, any element of a vector space, 
vector addition: Adding vectors by adding corresponding 
entries. 

vector equation: An equation involving a linear combination 
of vectors with undetermined weights, 
vector space: A set of objects, called vectors, on which two 
operations are defined, called addition and multiplication by 
scalars. Ten axioms must be satisfied. See the first definition 
in Section 4.1. 

vector subtraction: Computing u + (—l)y and writing the re¬ 
sult as u — v. 

w 

weighted least squares: Least-squares problems with a 
weighted inner product such as 

(x,y) = w^xiyi + ... + w 2 n x n y n . 
weights: The scalars used in a linear combination. 

z 

zero subspace: The subspace {0} consisting of only the zero 
vector. 

zero vector: The unique vector, denoted by 0, such that 
u + 0 = u for all u. In R ”， 0 is the vector whose entries 
are all zeros. 




Answers to Odd-Numbered 
Exercises 


Chapter 1 

Section 1.1，page 10 


1. The solution is (x\,X 2 ) = (—8, 3), or simply (—8, 3). 

3. (2,1) 

5. Replace Row 2 by its sum with -4 times Row 3, and then 
replace Row 1 by its sum with 3 times Row 3. 


7. The solution set is empty. 

9. (16,21,14,4) 11. Inconsistent 

13. (5,3,-1) 15. Inconsistent 

17. Calculations show that the system is inconsistent, so the 
three lines have no point in common. 

19. h 伞 1 21. AWh 

23. Mark a statement True only if the statement is always true. 
Giving you the answers here would defeat the purpose of 
the true-false questions, which is to help you learn to read 
the text carefully. The Study Guide will tell you where to 
look for the answers, but you should not consult it until you 
have made an honest attempt to find the answers yourself. 


25. k — 2g h = 0 


27. 


The row reduction of 
a b f 
c d g 

shows that d — 


to 


b f 

d~b( c a ) g-f( c a ) 

b(-) must be nonzero, since / and g are 


arbitrary. Otherwise, for some choices of / and g the 
second row could correspond to an equation of the form 
0 = q, where q is nonzero. Thus ad ^ be. 


29. Swap Row 1 and Row 3; swap Row 1 and Row 3. 


31. Replace Row 3 by Row 3 + (—4)Row 1; replace Row 3 by 
Row 3 + (4)Row 1. 


33. Review Practice Problem 1 and then write a solution. The 
Study Guide has a solution. 


Section 1.2, page 21 

1. Reduced echelon form: a and b. Echelon form: d. Not in 
echelon form: c. 

1 2 0 - 8 " 

3. 0 0 1 4 • 

0 0 0 0 


'1 2 4 8 

Pivot cols 1 and 3: 2 4 6 8 

3 6 9 12 


■ * " 


■ ■ * " 


'0 ■- 

0 ■ 


_0 0 

* 

0 0 _ 



乂 

二 

_ 5 一 3x2 

卜 1 = 

= 3 + 

2x 3 

7 . 

X 2 

is 

free. 

9. lx 2 = 

= 3 + 

2X3 


X 3 

= 

3 

1 又 3 is free. 




二 

1^2 — |^3 




1L 

x 2 

is 

free. 






is 

free. 





^1 

= 

5 + 3^5 





Xl 

= 

1 + 4^5 




13. 

A 

is 

free. 





x 4 

二 

4 — 9x 5 





x 5 

is 

free. 





Note: The Study Guide discusses the common mistake 

X3 = 0. 

15. a. Consistent, with many solutions 
b. Consistent, with many solutions 

17. AWh 


19. a. Inconsistent when h = 2 and k ^ S 

b. Unique solution when h ^ 2 

c. Many solutions when h = 2 and k = S 

21. Read the text carefully, and write your answers before you 
consult the Study Guide. Remember, a statement is true 
only if it is true in all cases. 

23. Since there are four pivots (one in each column of the 
coefficient matrix), the augmented matrix must reduce to 
the form 

'1 0 0 0 a~ 

0 1 0 0 Z? 

0 0 10c 

0 0 0 1 ( 

and so 

X\ = a 

X 2 = b 

X 3 = c 
JC4 — d 


A17 























A18 Answers to Odd-Numbered Exercises 


No matter what the values of a, b, c and d, the solution 
exists and is unique. 

25. If the coefficient matrix has a pivot position in every row, 
then there is a pivot position in the bottom row, and there is 
no room for a pivot in the augmented column. So, the 
system is consistent, by Theorem 2. 

27. If a linear system is consistent, then the solution is unique if 
and only if every column in the coefficient matrix is a pivot 
column; otherwise, there are infinitely many solutions. 

29. An underdetermined system always has more variables than 
equations. There cannot be more basic variables than there 
are equations, so there must be at least one free variable. 
Such a variable may be assigned infinitely many different 
values. If the system is consistent, each different value of a 
free variable will produce a different solution. 

31. Yes, a system of linear equations with more equations than 
unknowns can be consistent. The following system has a 
solution (xi = X 2 = 1 )： 

X\ X 2 ― 2 
X\ — X 2 = 0 

3x\ + 2^2 — 5 

33. p(j^ = 1 + 3? + 


Section 1.3, page 32 


"-4" 


"5" 

1 


4 




3 


5 


2 

^1 

-2 

+ X2 

0 

= 

-3 


8 


-9 


8 

?>X\ 



5X2 


2 

_ 2,xi 

+ 


0 

二 

-3 

8 x 1 



—9^2 


8 


7 . 


3x\ + 5x2 2 

-2xi = -3 

— 9x2 8 

3^i + 5^2 = 2 

-2xi = -3 
8x1 — 9 jx ：2 = 8 

Usually the intermediate steps are not displayed. 

a = u — 2v, b = 2u — 2y, c = 2u — 3.5v, d = 3u — 4y 
Yes, every vector in R 2 is a linear combination of u and y. 


9. x\ 


0 


1 


5 


0 

4 

+ X2 

6 

+13 

-1 

= 

0 

-1 


3 


-8 


0 


11. No, b is not a linear combination of ai, a 2 , and a〗. 

13. No, b is not a linear combination of the columns of A. 
15. h = 3 

17. Noninteger weights are acceptable, of course, but some 
simple choices are 0 - Vi + 0 • V 2 = 0, and 


• Vi + 0 ■ v 2 : 


1 • Vl + 1. V 2 : 


0 • Vi + 1 • y 2 ： 


• Vi — 1 • V2 1 


19. Span {vi ， V 2 } is the set of points on the line through Vi and 
0, because V 2 is a multiple of Vi. 

— 2 2 IT 

-1 1 k 

k. Explain what this calculation shows about Span {u, v}. 


21. Hint: Show that 


is consistent for all h and 


23. Before you consult your Study Guide, read the entire 
section carefully. Pay special attention to definitions and 
theorem statements, and note any remarks that precede or 
follow them. 

25. a. No, three b. Yes, infinitely many 

c. ai = 1 • ai + 0 • a 2 + 0 • a 3 
27. a. 5vi is the output of 5 days of operation of mine #1. 

b. The total output is XiVi + so X\ and X 2 should 
240" 


satisfy x\\\ + X 2 \i 


2824 


c. [M] 1.73 days for mine #1 and 4.70 days for mine #2 
29. (17/14, -34/14,16/14) = (17/14,-17/7, 8/7) 


31. a. 


10/3 

2 


b. Add 3.5 g at (0,1), add 0.5 g at ( 8 ,1), and add 2 g at 
(2,4). 

33. Review Practice Problem 1 and then write a solution. The 
Study Guide has a solution. 


Section 1.4, page 40 


1. The product is not defined because the number of columns 

( 2 ) in the 3x2 matrix does not match the number of entries 

(3) in the vector. 


3. a. Ax = 

1 2 
-3 1 

'- 2 ' 

=—2 • 

1 

-3 

+ 3. 

2 

1 


1 6 

j 


1 


6 


-2 


6 


4 

6 

+ 

3 

二 

9 

-2 


18 


16 




























































Section 1.5 A19 


b. Ax = 

"1 2 " 
-3 1 

'- 2 ' 

— 

1.(-2)+ 2. (3)' 
(-3) • (-2) +1-(3) 


1 6 

j 


1 . (-2) + 6 . (3) 


4 

9 

16 


Show your work here and for Exercises 4-6, but 
thereafter perform the calculations mentally. 



C 2 = 1 ， C 3 = 2 , C 4 = — 1 , C 5 = 2 , and 



'-3' 


'5' 


'-4" 

Vi = 

5_ 

， v 2 = 

_8_ 

,v 3 = 

1_ 



9' 


7' 


ir 

V 4 = 

_-2_ 

, v 5 = 

-4 

， V6 = 

-11 


29. Hint: Start with any 3x3 matrix B in echelon form that 
has three pivot positions. 

31. Write your solution before you check the Study Guide. 


7. 


9. x\ 


4 -5 7 " 




6" 

-1 3 -8 


ii 


-8 

7-5 0 


X 2 

— 

0 

-4 1 2 


_ 


-7 


.x 2 


+ X3 


and 



「5 

1 - 

-3] 

"^i ~ 


r 8" 




0 

2 

4 

义 2 

— 

0 







|_ 叉 3 」 






"1 

3 

-4 

-2 

— 



"^1 ~ 


"-11 " 

11 . 

1 

5 

2 

4 

, x 

= 


二 

3 


-3 

-7 

6 

12 



X3 


0 


33. Hint: How many pivot columns does A have? Why? 

35. Suppose y and z satisfy Ay = z. Then 5z = 5Ay. By 

Theorem 5(b), 5^4y = ^4(5y). So 5z = i4(5y), which shows 
that 5y is a solution of Ax = 5z. Thus the equation 
Ax = 5z is consistent. 

37. [M] The columns do not span R 4 . 

39. [M] The columns span R 4 . 

41. [M] Delete column 4 of the matrix in Exercise 39. It is also 
possible to delete column 3 instead of column 4. 
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13. Yes. (Justify your answer.) 

15. The equation Ax = b is not consistent when 3bi + Z ?2 is 
nonzero. (Show your work.) The set of b for which the 
equation is consistent is a line through the origin—the set of 


1. The system has a nontrivial solution because there is a free 
variable, x^. 

3. The system has a nontrivial solution because there is a free 
variable, X 3 . 


all points (办 1 ，办 2 ) satisfying = —3Z?i. 


"•^1 " 


「― 1 1 



17. Only three rows contain a pivot position. The equation 

5. x = 

x 2 

= 又 3 

-1 



Ax = b does not have a solution for each b in R 4 , by 


X 3 


1 



Theorem 4. 


Xl 


-9 


8 

19. The work in Exercise 17 shows that statement (d) in 

7. x = 

X 2 


4 


-5 

Theorem 4 is false. So all four statements in Theorem 4 are 

X 3 

- 又 3 

1 

+ 又 4 

0 

false. Thus, not all vectors in R 4 can be written as a linear 


X 4 


0 


1 


combination of the columns of A. Also, the columns of A 
do not span R 4 . 

21 . The matrix [vi \2 V 3 ] does not have a pivot in each row, 

so the columns of the matrix do not span R 4 , by Theorem 4. 
That is, {vi, V 2 , V 3 } does not span R 4 . 


9. X = X 2 


0 


11. Hint: The system derived from the reduced echelon form is 


23. Read the text carefully and try to mark each exercise 

statement True or False before you consult the Study Guide. 
Several parts of Exercises 29 and 30 are implications of the 
form 

“If (statement 1), then (statement 2〉’’ 
or equivalently, 

44 (statement 2), if (statement 1〉” 


X\ — 4%2 + 5^6 — 0 

X3 — = 0 

X 5 — 4x6 = 0 
0 = 0 

The basic variables are X 3 , and X 5 . The remaining 
variables are free. The Study Guide discusses two mistakes 
that are often made on this type of problem. 


Mark such an implication as True if (statement 2) is true in 
all cases when (statement 1 ) is true. 

25. ci = —3, C 2 = — 1, C 3 = 2 
27. The matrix equation can be written as 


13. x 


5' 


4" 

-2 

+ 尤 3 

-7 

0 


1 


p + X 3 q. Geometrically, the 



5" 


4" 

solution set is the line through 

-2 

parallel to 

-7 

3, 

0 


1 
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-5 


3 


-2 

15. Let u = 

1 

,v = 

0 

P = 

0 


0 


1 


0 


.The solution of 


the homogeneous equation is x = X 2 U + x^\, the plane 


through the origin spanned by u and v. The solution set of 
the nonhomogeneous system is x = p + X 2 U + ^v, the 
plane through p parallel to the solution set of the 
homogeneous equation. 



ii 


8 


-1 

17. x = 


= 

-4 

+ X 3 

-1 


^3 


0 


1 


The solution set is the 


line through —4 , parallel to the line that is the solution 
0 _ 

set of the homogeneous system in Exercise 5. 


19. x = a + tb, where t represents a parameter, or 

(xi = — 2 — St 
, or < 

|x 2 = 3t 


21. x = p + / (q — p)= 

23. It is important to read the text carefully and write your 
answers. After that, check the Study Guide, if necessary. 

25. a. = ^4(p + \h) = v4p + A\ h = b + 0 = b 
b. A\h = ^4(w — p) = y4w — ^p = b — b = 0 

27. {Geometric argument using Theorem 6) Since the equation 
Ax = b is consistent, its solution set is obtained by 
translating the solution set of Ax = 0, by Theorem 6. So 
the solution set of Ax = b is a single vector if and only if 
the solution set of Ax = 0 is a single vector, and that 
happens if and only if Ax = 0 has only the trivial solution. 



A 

x 2 


(Proof using free variables) If Ax = b has a solution, then 
the solution is unique if and only if there are no free 
variables in the corresponding system of equations, that is, 
if and only if every column of ^4 is a pivot column. This 
happens if and only if the equation Ax = 0 has only the 
trivial solution. 

29. a. When 乂 is a 4 x 4 matrix with three pivot positions, the 
equation Ax = 0 has a free variable and hence has 
nontrivial solutions. 

b. With three pivot positions, A does not have a pivot 
position in each of its four rows. By Theorem 4 in 
Section 1.4, the equation = b does not have a 
solution for every possible b. The word “possible” in 
the exercise means that the only vectors considered in 
this case are those in R 4 , because A has four rows. 

31. a. When ^4 is a 3 x 2 matrix with two pivot positions, each 
column is a pivot column. So the equation Ax = 0 has 
no free variables and hence no nontrivial solution, 
b. With two pivot positions and three rows, A cannot have 
a pivot in every row. So the equation Ax = b cannot 
have a solution for every possible b (in R 3 ), by 
Theorem 4 in Section 1.4. 


33. Your example should have the property that the sum of the 
entries in each row is zero. Why? 

厂 3" 

35. One answer: x = n 


37. One answer is ^4 = ^ ^ . The Study Guide shows 

how to analyze the problem in order to construct A. If b is 
any vector not a multiple of the first column of A, then the 
solution set of Ax = b is empty and thus cannot be formed 
by translating the solution set of Ax = 0. This does not 
contradict Theorem 6, because that theorem applies when 
the equation Ax = b has a nonempty solution set. 


39. Suppose A\ = 0 and Aw = 0. Then, since 

A(y + w) = A\ + Ayv by Theorem 5(a) in Section 1.4, 
A(\ + w) = ^4v + ^4w = 0 + 0 = 0. Now, let c and d be 
scalars. Using both parts of Theorem 5, A{c\ + dw)= 
A{c\) + A(dw) = cA\ + dAw = cO + JO = 0. 
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1. The general solution is Pq = .875/?s, with ps free. One 
equilibrium solution is ps = 1000 and pq = 875. Using 
fractions, the general solution could be written 
Pq = (7/8)/?s, and a natural choice of prices might be 
p s = 80 and pc = 70. Only the ratio of the prices is 
important. The economic equilibrium is unaffected by a 
proportional change in prices. 


3. a. Distribution of 

Output from: 
F&P Man. Ser. 


Output 

.1 

.8 


•8 


r Input Purchased by: 
.2 — F&P 

•4 —► Man. 

•4 —^ Ser. 


.9 -.1 -.2 0 

b. -.8 .9 -.4 0 

— .1 — .8 .6 0 


c. 

[M] p F& p ^ 

30, 

11, Ps = 100. 


5. a. 







Distribution of Output from: 



Ag. 

Man. 

Ser. 

Transp. 



Output 

1 

1 

Input 

Purchased by: 


.20 

.35 

.10 

.20 — 

Ag. 


.20 

• 10 

.20 

.30 — 

Man. 


.30 

.35 

.50 

.20 

Ser. 


.30 

.20 

.20 

.30 — 

Transp. 

b. 

One solution 

is Pa = 

= 7.99, Pm — 8.36, = 14.65, 


and pj = 10.00. 
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c. 


Ag. Man. Ser. Transp. 


Output 



Input Purchased by 

.20 .35 

.10 

.20 

一 Ag. 

.10 .10 

.20 

•30 

—► Man. 

.40 .35 

.50 

.20 

—► Ser. 

.30 .20 

.20 

.30 

—► Transp. 

One solution is p A 

= 7.81, pu[ - 

= 7.67, p s = 15.62, 


and pj = 10 . 00 . 

The campaign has benefited Services the most. 

7. 3NaHC0 3 + H 3 C 6 H 5 0 7 — Na 3 C 6 H 5 0 7 + 3H 2 0 + 3C0 2 
9. B 2 S 3 + 6H 2 0 — 2H 3 B0 3 + 3H 2 S 
11. [M] 16MnS + 13As 2 Cr 10 O 35 + 374H 2 S0 4 

-> 16HMn0 4 + 26 AsH 3 + 130CrS 3 O 12 + 327H 2 0 


13. a. 


•^1 = 

x 2 = 

=x 3 — 40 

=X 3 + 10 


x 2 = 

= 50 

Xri is free 

b. 

^3 = 

= 40 

— 

=x 6 -\- 50 

X 4 = 

= 50 

x 5 = 

=^6 + 60 


又 5 = 

= 60 

x 6 is free 





xi = 60 + xe 
X2 ― — 10 
x 3 = 90 + x 6 

•^•4 = ^6 
X 5 — 80 Xg 

X 6 is free 

In order for the flow to be nonnegative, > 10 


Section 1.7, page 60 

Justify your answers to Exercises 1-22. 

1. Lin. indep. 3. Lin. depen. 

5. Lin. indep. 7. Lin. depen. 

9. a. No h b. All h 
11. h = -4 13. AUh 

15. Lin. depen. 17. Lin. depen. 19. Lin. indep. 

21. If you consult your Study Guide before you make a good 
effort to answer the true-false questions, you will destroy 

0" 

0 


27. All four columns of the 6x4 matrix A must be pivot 
columns. Otherwise, the equation Ax = 0 would have a 
free variable, in which case the columns of A would be 
linearly dependent. 


23. 


25. 


most of their value. 

0 ■ 


0 


0 0 


■ 

氺 


"0 

■ 

0 

■ 

and 

0 

0 

0 

0 

0 

0 

0 

0 _ 


0 

0 


29. A: Any 3x2 matrix with the second column a multiple of 
the first will have the desired property. 

B: Any 3x2 matrix with two nonzero columns such that 
neither column is a multiple of the other will work. In this 
case, the columns form a linearly independent set, and so 
the equation Bx = 0 has only the trivial solution. 

"1" 

31. x = 1 

_-l_ 

33. True, by Theorem 7. (The Study Guide adds another 
justification.) 

35. True, by Theorem 9. 

37. True. A linear dependence relation among Vi, V 2 , and V 3 
may be extended to a linear dependence relation among Vi, 
V 2 , V 3 , and V 4 by placing a zero weight on V 4 . 

39. You should be able to work this important problem without 
help. Write your solution before you consult the Study 
Guide. 

41. [M] Using the pivot columns of A, 

B = 

Other choices are possible. 

43. [M] Each column of A that is not a column of B is in the 
set spanned by the columns of B. 


3-4 7 

-5 -3 -11 
4 3 2 

8-7 4 
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2 ' 


2 a 

—6 

' 

■ 2 b _ 


5. x = 1 , not unique 

_ 0 _ 

"4" 

Q 3 

9. x = X 3 1 

0 


3. x = 6 , unique solution 

_3_ 

7. a = 5, b = 6 


11. Yes, because the system represented by [ A b ] is 
consistent. 



A reflection through the origin 
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A reflection through the line X\ = 


17. 



19. 


"13' 


2 ^i — X2 

■ 1 ■ 

' 

5xi + 6 x 2 


21. Read the text carefully and write your answers before you 
check the Study Guide. Notice that Exercise 21(e) is a 
sentence of the form 

"(statement 1 ) if and only if (statement 2 〉” 

Mark such a sentence as True if (statement 1) is true 
whenever (statement 2 ) is true and also (statement 2 ) is true 
whenever (statement 1 ) is true. 


23. a. When b = 0, f{x) = mx. In this case, for all x, j in R 
and all scalars c and d, 
f(cx + dy) = m(cx + dy) = mcx + mdy 
=c(mx) + d{my) 

=c . f(x) + d - f(y) 




31. Hint: Since {vi, V 2 , ¥ 3 } is linearly dependent, you can write 
a certain equation and work with it. 

33. One possibility is to show that T does not map the zero 
vector into the zero vector, something that every linear 
transformation does do: T (0,0) = (0, —3,0). 

35. Take u and y in R 3 and let c and d be scalars. Then 
cu d\ = (cui + dv\,cu2 + dv2, CU3 + dvs) 

The transformation T is linear because 
T(cu + d\) = (cu\ + dv\, 0 , cu 3 + dv 3 ) 

=(cu\,0, cu^) + (dv\,0, dv 3 ) 

=c{u\,0, 1/3) + d(vi,0, V3) 

= cT(u) + dT{\) 

37. [M] All multiples of (-1,-1,1,0) 

39. [M] Yes. One choice for x is (1,2,0,0). 


This shows that / is linear. 

b. When f(x) = mx + b, with b nonzero, 

/( 0 ) = m( 0 ) b = b ^ 0. 

c. In calculus, / is called a “linear function” because the 
graph of / is a line. 


25. Hint: Show that the image of a line (that is, the set of 
images of all points on a line) can be represented by the 
parametric equation of a line. 

27. Any point x on the plane P satisfies the parametric equation 
x = su 1\ for some values of s and t. By linearity, the 
image T (x) satisfies the parametric equation 

T(x) = sT(u) + ?7(y) ( 5 ,finR) (*) 

The set of images is just Span {T(u), T (y)}. If T (u) and 
T (y) are linearly independent, Span {T(u), T (v)} is a plane 
through r(u), r(y), and 0. If T(u) and T (y) are linearly 
dependent and not both zero, then Span {r(u), T (y)} is a 
line through 0. If T (u) = T(v) = 0, then 
Span{!T(u), r(y)} is {0}. 


Section 1.9, page 78 
"3 -5" 

1 0 

0 -1 " 

-1 0 _ 

11. The described transformation T maps ei into —ei and maps 
e 2 into —e 2 . A rotation through n radians also maps ei into 
—ei and maps e 2 into —e 2 . Since a linear transformation is 
completely determined by what it does to the columns of 
the identity matrix, the rotation transformation has the same 
effect as T on every vector in R 2 . 


7. 


-1/V2 1/V2 

1/V^2 1/V2 


9. 



13. 


x i 
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15. 


19. 


-4 0 

0 -1 
-1 3 

-5 ^ 

1 -( 


17. 


2 0 0 

0 0 0 

2 0 1 

1 0 -1 


21. x 


23. Answer the questions before checking the Study Guide. 
Justify your answers to Exercises 25-28. 

25. Not one-to-one and does not map R 4 onto R 4 
27. Not one-to-one but maps R 3 onto R 2 


29. 


■ 氺氺 

0 ■氺 

0 0 ■ 


0 0 0 

31. n. (Explain why, and then check the Study Guide.) 

33. Hint: If e 7 is the yth column of I n , then Be) is the yth 
column of B. 

35. Hint: Is it possible that m > nl What about m <nl 
37. [M] No. (Explain why.) 

39. [M] No. (Explain why.) 

Section l.io, page 86 


a. x\ 


110 


130 


295 

4 

+ X2 

3 


9 

20 

18 


48 

2 


5 


8 


,where X\ is the 


number of servings of Cheerios and is the number of 

servings of 100% Natural Cereal. 


b. = AQ . Mix 1.5 servings of 


"110 130" 


"295~ 

4 3 


■ 


9 

20 18 


又2 


48 

2 5_ 


8 


Cheerios together with 1 serving of 100% Natural 
Cereal. 


7. Ri 


: v, 


9. 


[M]: 


x^+i 


M 


12 

-7 

0 - 

-7 

15 

—6 

0 

—6 

14 - 

-4 

0 

-5 1 

"/r 


"11.43' 

h 


10.55 

h 


8.04 

_h_ 


5.84 


h 

h 

h 

u 


40 

30 

20 

-10 


Mxk for k 

.93 .05_ 

.07 .95 


0 , 1 , 2 ,.... 


and xq : 


where 

800,000 

500,000 


The population in 2012 (for A: = 2) is X 2 ： 


741,720 

558,280 


11. a. M 


.98363 

.01637 


.00167 

.99833 


b. [M] x 6 : 


30,754,500 

229,449,000 


13. [M] 

a. The population of the city decreases. After 7 years, the 
populations are about equal, but the city population 
continues to decline. After 20 years, there are only 
417,000 persons in the city (417,456 rounded off). 
However, the changes in population seem to grow 
smaller each year. 

b. The city population is increasing slowly, and the 
suburban population is decreasing. After 20 years, the 
city population has grown from 350,000 to about 
370,000. 
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a. 

F 

b. 

F 

c. 

T 

d. 

F 

e. 

T 

f. 

T 

g. 

F 

h. 

F 

i. 

T 

j- 

F 

k. 

T 

1 . 

F 

m. 

T 

n. 

T 

0 . 

T 

P. 

T 

q- 

F 

r. 

T 

s. 

F 

t. 

T 

u. 

F 

v. 

F 

w. 

F 

X. 

T 

y- 

T 

z. 

F 










3. a. Any consistent linear system whose echelon form is 


3. 


a. She should mix .99 serving of Mac and Cheese, 1.54 

■ 氺 氺氺 


■ 氺 氺氺 

servings of broccoli, and .79 serving of chicken to get 

0 ■氺氺 

or 

0 0 ■氺 

her desired nutritional content. 

0 0 0 0 


0 0 0 0 


b. She should mix 1.09 servings of shells and white 
cheddar, .88 serving of broccoli, and 1.03 servings of 
chicken to get her desired nutritional content. Notice 
that this mix contains significantly less broccoli, so she 
should like it better. 


氺 

■ 

0 


5. Ri 


: v, 


11 -5 0 O' 


[M]: 


-5 

10 

-1 0 

0 

-1 

9 -2 

0 

0 

-2 10 

"/r 


" 3.68" 

h 


-1.90 

h 


2.57 

h 


-2.49 


50' 

-40 

30 

-30 


b. Any consistent linear system whose reduced echelon 
form is I 3 . 

c. Any inconsistent linear system of three equations in 
three variables. 

5. a. The solution set: (i) is empty if A = 12 and k ^ 2\ (ii) 
contains a unique soltution if h ^ 12; (iii) contains 
infinitely many solutions if A = 12 and k = 2 . 
b. The solution set is empty if /: + 3A = 0; otherwise, the 
solution set contains a unique solution. 
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2 


-4 


-2 

7. a. Setvi = 

-5 

7 

,v 2 = 

1 

-5 

,v 3 = 

1 

-3 


,and 


b 


b. Set/ 


.“Determine if Vi, V 2 , ¥3 span R 3 .” 


bi 
bi 

• h • 

Solution: No. 

2-4 -2' 

-5 1 1 

7 -5 -3_ 
of A span R 3 .” 

c. Define T(x) = Ax. “Determine if T maps R 3 onto E 3 .’’ 


‘Determine if the columns 


9. 


6 


4 


2 


+ 


7 


2 


8/3 

4/3 


+ 


7/3 

14/3 


10. Hint: Construct a “grid” on the XiX 2 -plane determined by 
ai and & 2 - 


11. A solution set is a line when the system has one free 
variable. If the coefficient matrix is 2 x 3, then two of the 
columns should be pivot columns. For instance, take 

1 2 氺 1 . . 

^ 3 ^ . Put anything in column 3. The resulting 

matrix will be in echelon form. Make one row replacement 
operation on the second row to create a matrix not in 

echelon form, such as 

12. Hint: How many free variables are in the equation Ax = 0? 

1 0 -3* 


"1 

2 

1" 


'1 

2 

r 

_0 

3 

1 


1 

5 

2 _ 


13. E 


15. a. If the three vectors are linearly independent, then a, c, 

and / must all be nonzero, 
b. The numbers a,... ， / can have any values. 

16. Hint: List the columns from right to left as Vi ， ... ， V 4 . 


17. Hint: Use Theorem 7. 


19. Let M be the line through the origin that is parallel to the 
line through Vi, V2, and V3. Then V2 — Vi and V3 — Vi are 
both on M. So one of these two vectors is a multiple of the 
other, say \2 — Vi = k(\2 — Vi). This equation produces a 
linear dependence relation: (k — l)vi 4- V 2 — k\s = 0. 

A second solution: A parametric equation of the line is 
x = Vi + t (\2 — Vi). Since V3 is on the line, there is some to 
such that V3 = Vi + to (\2 — Vi) = (1 — /o)vi + k\ 2 . So V3 
is a linear combination of Vi and ¥2, and {vi, ¥2, V3} is 
linearly dependent. 

' 1 0 0" 

21. 0 -1 0 23. a = 4/5, b = -3/5 

0 0 1 


25. a. The vector lists the numbers of three-, two-, and 

one-bedroom apartments provided when x\ floors of 
plan A are constructed. 



3 


4 


5 

b. x\ 

7 

+ X2 

4 

+ X 3 

3 


8 


8 


9 


c. [M] Use 2 floors of plan A and 15 floors of plan B. Or, 
use 6 floors of plan A, 2 floors of plan B, and 8 floors of 
plan C. These are the only feasible solutions. There are 
other mathematical solutions, but they require a 
negative number (or a fractional number) of floors of 
one or two of the plans, which makes no physical sense. 


Chapter 2 


Section 2.1, page 100 


-4 0 2 

-8 10 -4 

1 13" 

-7 -6 


6 -15 
9 -6 



"- 10 " 


"11 " 

5. a. Abi = 

0 

, Ab2 = 

8 


26 


-19 


= -10 

11 " 


AB = 

0 

8 



26 

-19 



,not defined, 


1.4 +3(-2) -1(-2)+ 3-3 
b. AB = 2-4 + 4(-2) 2(-2) + 4.3 

5 • 4 - 3(-2) 5(-2) -3-3 

"-10 11 " 

= 08 
26 -19 


7. 

3x7 


9. 

k = 

一 2 






5 

6 

6 " 


■5 

10 

15" 

11. 

AD = 

10 

12 

10 

,DA = 

6 

12 

15 



15 

15 

12 


6 

10 

12 


Right-multiplication (that is, multiplication on the right) by 
D multiplies each column of A by the corresponding 
diagonal entry of D . Left-multiplication by D multiplies 
each row of A by the corresponding diagonal entry of D. 
The Study Guide tells how to make AB = BA, but you 
should try this yourself before looking there. 


13. Hint: One of the two matrices is Q. 


15. Answer the questions before looking in the Study Guide. 

17. bi = 

19. The third column of AB is the sum of the first two columns 
of AB. Here’s why. Write 5 = [ bi b 2 b 3 ]. By 
definition, the third column of AB is ^ 3 . If b 3 = bi + b 2 , 
then i4b 3 = A(b\ + b 2 ) = Abi + Ab 2 , by a property of 
matrix-vector multiplication. 



21. The columns of A are linearly dependent. Why? 

23. Hint: Suppose x satisfies Ax = 0, and show that x must be 

0. 
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25. Hint. Use the results of Exercises 23 and 24, and apply the 
associative law of multiplication to the product CAD. 

27. u r v = y r u = —3a -\- 2b — 5c, 



—3a 

—3b 

—3c 

T 

uv = 

2 a 

2 b 

2 c 


—5a 

—5b 

—5c 


_ —3a 

2 a 

—5a ~ 

T 

vu = 

—3b 

2 b 

—5b 


—3c 

2 c 

—5c 


29. Hint: For Theorem 2(b), show that the (i , y)-entry of 
A(B + C) equals the (/ , y)-entry of AB + AC. 

31. Hint. Use the definition of the product I m A and the fact that 
I m x = x for x in R m . 

33. Hint: First write the (/, y)-entry of (AB) T , which is the 
(j, i )-entry of AB. Then, to compute the (/, y)-entry in 
B T A T , use the facts that the entries in row i of B T are 
bu,..., b n i ，because they come from column i of B, and 
the entries in column j of A T are aj\, , aj n , because they 
come from row j of A. 

35. [M] The answer here depends on the choice of matrix 
program. For MATLAB, use the help command to read 
about zeros, ones, eye, and diag. For the TI-86, 
study the dim, fill, and iden instructions. The TI-86 
does not have a “diagonal” command. 

37. [M] Display your results and report your conclusions. 

39. [M] Display your results and report your conclusions. 

41. The matrices appear to approach the matrix 


1/3 

1/3 

1/3" 

1/3 

1/3 

1/3 

1/3 

1/3 

1/3 
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2 -3' 

, 1 

'-3 -3" 


1 1 

-5/2 4_ 

3. — 

3 

6 7_ 

or 

-2 -7/3 ■ 


5. x\ =1 and X2 = —9 


-9 


11 


6 

,and 

13 

4 


-5 


-2 

-5 


9. Write out your answers before checking the Study Guide. 

11. The proof can be modeled after the proof of Theorem 5. 

13. AB = AC ^ A~ l AB = A~ l AC IB = IC ^ B = 
C. No, in general, B and C can be different when A is not 
invertible. See Exercise 10 in Section 2.1. 

15. Hint: Apply the elementary matrices used to row reduce A 
to /, to the matrix [^4, B\. 

YJ. D = C~ x B~ l A~ l . Show that D works. 

19. After you find X = CB — A, show that X is a solution. 

21. Hint: Consider the equation Ax = 0. 


23. Hint: If Ax = 0 has only the trivial solution, then there are 
no free variables in the equation Ax = 0, and each column 
of ^4 is a pivot column. 


25. 


Hint: Consider the case a = b = 0. Then consider the 
一 b 

vector , and use the fact that ad — be = 0. 

a 


27. Hint: For part (a), interchange A and B in the box 

following Example 6 in Section 2.1, and then replace B by 
the identity matrix. For parts (b) and (c), begin by writing 


rowi(y4) 

row2(y4) 

rOW3(y4) 




'-9 3" 

—4 1 _ 


8 

3 


_ 

29. 

1 

3 

31. 

10 

4 






_7/2 

3/2 

1/2 _ 

33. 

The general form of A~ l 

is 







■ 1 0 

0 


0" 





-1 1 

0 


0 



A' 

1 = B = 

0 -1 

1 







0 0 


-1 

1 



Hint: For j = 1, let a y -, b y -, and ey denote the yth 
columns of A, B, and /, respectively. Use the facts that 
a ; - — a)+i = e / and bj = e 7 - — ey+i for j = 1,— 1, 
and a” = = e”. 


35. 


.Find this by row reducing [A 63]. 


37. C 


1 1 

-1 1 


39. [M] The deflections are .62, .66, and .52 inches, 
respectively. 


41. [M] .95, 6.19, 11.43, and 3.81 newtons, respectively 


Section 2.3, page 115 

The abbreviation IMT (here and in the Study Guide) denotes the 
Invertible Matrix Theorem (Theorem 8). 

1. Invertible, by the IMT. Neither column of the matrix is a 
multiple of the other column, so they are linearly 
independent. Also, the matrix is invertible by Theorem 4 in 
Section 2.2 because the determinant is nonzero. 

3. Notice that A T has a pivot in every column, so by IMT, A T 
is invertible. Hence by IMT, A is also invertible. 

5. Not invertible, by the IMT. Since this matrix has a column 
of zeros, its columns form a linearly dependent set and 
hence the matrix is not invertible. 
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7. Invertible, by the IMT. The matrix row reduces to 

-1-3 0 1" 

0-480 
0 0 3 0 

0 0 0 1 

and has four pivot positions. 

9. [M] The 4x4 matrix has four pivot positions, so it is 
invertible by the IMT. 

11. The Study Guide will help, but first try to answer the 
questions based on your careful reading of the text. 

13. A square upper triangular matrix is invertible if and only if 
all the entries on the diagonal are nonzero. Why? 

Note: The answers below for Exercises 15-29 mention the IMT. 

In many cases, part or all of an acceptable answer could also be 

based on results that were used to establish the IMT. 

15. No, because statement (h) of the IMT is then false. A 4 x 4 
matrix cannot be invertible when its columns do not span 
R 4 . 

17. If A has two identical columns, then its columns are linearly 
dependent. Part (e) of the IMT shows that A cannot be 
invertible. 

19. By statement (e) of the IMT, D is invertible. Thus the 
equation Dx = b has a solution for each b in R 7 , by 
statement (g) of the IMT. Can you say more? 

21 . The matrix C cannot be invertible, by Theorem 5 in Section 
2.2 or by the paragraph following the IMT. So statement (g) 
of the IMT is false and so is (h). The columns of C do not 
span E”. 

23. Statement (g) of the IMT is false for F, so statement (d) is 
false, too. That is, the equation Fx = 0 has a nontrivial 
solution. 

25. Hint: Use the IMT first. 

27. Let W be the inverse of AB. Then ABW = I and 

A(BW) = /. Unfortunately, this equation by itself does 
not prove that A is invertible. Why not? Finish the proof 
before you check the Study Guide. 

29. Since the transformation x i-^ Ax is one-to-one, statement 
(f) of the IMT is true. Then statement (i) is also true and the 
transformation x\-^- Ax maps R” onto R”. Also, A is 
invertible, which implies that the transformation x i-^- Ax is 
invertible, by Theorem 9. 

31. Hint: If the equation = b has a solution for each b, then 
A has a pivot in each row (Theorem 4 in Section 1.4). 

Could there be free variables in an equation Ax = b? 

33. Hint: First show that the standard matrix of T is invertible. 
Then use a theorem or theorems to show that 
_ 7 9" 

7" _1 (x) = Bx, where B = • 


35. Hint: To show that T is one-to-one, suppose that 

T (u) = T (y) for some vectors u and v in R”. Deduce that 
u = y. To show that T is onto, suppose y represents an 
arbitrary vector in and use the inverse S to produce an x 
such that T (x) = y. A second proof can be given using 
Theorem 9 together with a theorem from Section 1.9. 

37. Hint: Consider the standard matrices of T and U. 

39. If T maps E 71 onto R n , then the columns of its standard 
matrix A span R' by Theorem 12 in Section 1.9. By the 
IMT, A is invertible. Hence, by Theorem 9, T is invertible, 
and A~ l is the standard matrix of T~ l . Since A~ l is also 
invertible, by the IMT, its columns are linearly independent 
and span R”. Applying Theorem 12 in Section 1.9 to the 
transformation T~ l shows that T~ l is a one-to-one 
mapping of R” onto M' 

41. [M] 

a. The exact solution of (3) is x\ = 3.94 and^2 = .49. The 
exact solution of review (4) is X\ = 2.90 and = 2.00. 

b. When the solution of (4) is used as an approximation for 
the solution in (3), the error in using the value of 2.90 
for x\ is about 26%, and the error in using 2.0 for X2 is 
about 308%. 

c. The condition number of the coefficient matrix is 3363. 
The percentage change in the solution from (3) to (4) is 
about 7700 times the percentage change in the right side 
of the equation. This is the same order of magnitude as 
the condition number. The condition number gives a 
rough measure of how sensitive the solution of Ax = b 
can be to changes in b. Further information about the 
condition number is given at the end of Chapter 6 and in 
Chapter 7. 

43. [M] cond(v4) a 69,000, which is between 10 4 and 10 5 . So 
about 4 or 5 digits of accuracy may be lost. Several 
experiments with MATLAB should verify that x and Xi 
agree to 11 or 12 digits. 

45. [M] Some versions of MATLAB issue a warning when 
asked to invert a Hilbert matrix of order about 12 or larger 
using floating-point arithmetic. The product AA~ l should 
have several off-diagonal entries that are far from being 
zero. If not, try a larger matrix. 

Section 2.4, page 121 


_ A 

B _ 

3. 

■ C 

D~ 

EA + C 

EB + D 

A 

B 


5. Y = B~ l (explain why), X = —B~ l A, Z = C 
7. X = A~ l (why?), Y = -BA~\ Z = 0 (why?) 

9. A 2 I = _召21 召 [j 1 , 乂 31 = —召 31 召 [j 1 ， 

C 22 — ^22 — B 12 

11. You can check your answers in the Study Guide. 

V j) £ " 

13. Hint: Suppose A is invertible, and let A~ l = „ 广 . 

r Cj 

Show that BD = I and CG = I. This implies that B and 












Section 2.5 A27 


19. Hint: Think about row reducing [ A I ]. 

21. Hint: Represent the row operations by a sequence of 
elementary matrices. 

23. a. Denote the rows of D as transposes of column vectors. 
Then partitioned matrix multiplication yields 

A = CD = [ Ci … C 4 ] 

=Civf H - h C 4 y[ 

b. A has 40,000 entries. Since C has 1600 entries and D 
has 400 entries, together they occupy only 5% of the 
memory needed to store A. 



1+0 


0 + 0 


2-2 0 + (- 1) 2 


b. M 2 


oir^ o 

-A I -A 


A 2 -\-0 
A-A 


0 + 0 

0 + {-Af _ 

23. If A\ and B\ are (k l) x (k 1) and lower triangular, 


then write 山 


,where 


a 0 r l J „ r b 0 r 

\ 乂」 [w B 

A and B are k 乂 k and lower triangular, y and w are in 
and a and b are suitable scalars. Assume that the product of 
k x k lower triangular matrices is lower triangular, and 
compute the product A\B\. What do you conclude? 

25. Use Exercise 13 to find the inverse of a matrix of the form 
0 


B 


B n 


By 


,where B\\ is /? x B 22 is q x q, and 


B is invertible. Partition the matrix A, and apply your result 
twice to find that 


27. a., b. The commands to be used in these exercises will 


and B\ = 



The equation > 121 X 1 + ^ 22 X 2 = »2 yields 

^ 22 X 2 = b 2 — ^ 2 i x i» which can be solved for X 2 by row 

reducing the matrix [A 22 c], where c = b 2 — ^ 21 X 1 . 
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1 . Ly = b ^ y = 

"-7" 

-2 

,Ux = y ^ x = 

3" 

4 


6 


—6 


3. y 


6 

12 

0 


,x 


5. y ： 


,x : 


38 
16 
8.5 
一 4 


depend on the matrix program, 
c. The algebra needed comes from the block matrix 
equation 


'A n 

0 

"xi" 


bi 

_ ^21 

^22 _ 

_x 2 _ 


_ 


where Xi and bi are in R 20 and X 2 and \)2 are in R 30 . 
Then ^nXi = bi, which can be solved to produce Xi. 


C are invertible. (Explain why!) Conversely, suppose B 
and C are invertible. To prove that A is invertible, guess 
what A~ l must be and check that it works. 


15. Gk+i = [X k x k -\-i ] 




XkXl 七 X*+iX【+i 


=+ X 众 +iX[+i 

Only the outer product matrix x^+ix^^ needs to be 
computed (and then added to G^). 


17. The inverse of 


is 


I 

-X 


. Similarly, 


has an inverse. From equation (7), one obtains 


"I 0 " 

^11 ^12 

"/ -Y' 


'^4n O' 

-X I 

_^21 ^22 _ 

_0 I 


0 5 


(*) 


If A is invertible, then the matrix on the right side of (*) is a 
product of invertible matrices and hence is invertible. By 
Exercise 13, An and S must be invertible. 

19. VK( 5 ) = I m — C(A — sl n )~ l B. This is the Schur 
complement of ^4 — sl n in the system matrix. 


2 

7/ 


2 2 4 

13 0 


o 


3 0 0 2 

2 0 0 5 
I 

000 o 


00 


/2 


00 000 

1_1 1_1 

1 1 1 1 

o o 1 o o 1 o o 1 

o 11 o 11 11 o 11 


00 4 4 

-/ / o 

- 3 1 


2 1 / 

II 

I I 

4 4 2 

/ / / 


V. 


4 2 13 2 


L 

7 . 


9 . 


11 


13 


15 


f/ 

7 . 

1 


L 




o il 



01 

I 

11 ox- 


2 

a. 

21 . 




0 0 0 3 2 
5/ 


2 10 0 0 
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25. Explain why U, D, and V T are invertible. Then use a 
theorem on the inverse of a product of invertible matrices. 

27. a. 


7 . Sim 


./'l 


i i 

l 2 9 l 2 


A 

V 1 

1/2 ohm 

V 2 

> 9/2 

5 ohms 

V 3 


_ 


_ 



1.6 111.6 
1.2」 • [ 121.2 

c. Hint: Find a formula involving (/ — C)~ l . See the 
Study Guide. 


9. x 


82.8 

131.0 

110.3 


b. 




’2. :2 


, ,3 

V 1 

- 

:。 h 6 ms 

V 2 

3/4 ohm 

V 3 


- 

_ 


_ 



29. a. 


1 + 沢 3/ 穴 2 —Ri — Rs - (RiR^) / 

-l/R 2 1 + Ri/R 2 


b. A 


Set 


31. [M] 


a. L 


3 - 

-12 



-1/3 

5/3 ■ 



1 -6" 

"1 

0" 

"1 -2" 

0 1 

-1/3 1_ 

0 1 

= 2 ohms, R 2 

= 3 ohms, and 

1 0 

0 

0 

0 0 

-.25 1 

0 

0 

0 0 

-.25 -.0667 1 

0 

0 0 

0 -.2667 

—.2857 

1 

0 0 

0 0 

一 .2679 

-.0833 

1 0 


6 ohms 


0 

0 

L 0 


-.2917 

0 

0 


-.2921 

-.2697 

0 


1 

-.0861 

-.2948 


0 

0 

0 

0 

0 

0 

1 

.2931 


11. Hint: Use properties of transposes to obtain 
p r = p r C + \ T , so that 

p r x = (p r C + v r )x = p r Cx + y r x. Now compute p r x 
from the production equation. 

13. [M] x = 

(99576, 97703,51231,131570,49488,329554,13835). 

The entries in x suggest more precision in the answer than 
is warranted by the entries in d, which appear to be accurate 
only to perhaps the nearest thousand. So a more realistic 
answer for x might be 
x = 1000 x (100, 98,51,132,49, 330,14). 

15. [M] x( 12 ) is the first vector whose entries are accurate to the 
nearest thousand. The calculation of x( 12 ) takes about 
1260 flops, while row reduction of [(/ — C) d] takes only 
about 550 flops. If C is larger than 20 x 20, then fewer 
flops are needed to compute x (12) by iteration than to 
compute the equilibrium vector x by row reduction. As the 
size of C grows, the advantage of the iterative method 
increases. Also, because C becomes more sparse for larger 
models of the economy, fewer iterations are needed for 
reasonable accuracy. 


-4 

—1 

—1 

0 

0 

0 

0 

0 - 








0 

0 

3.75 

0 

-.25 

3.7333 

—1 

—1.0667 

0 

—1 

0 

0 

0 

0 

0 

0 
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0 

0 

0 

3.4286 

-.2857 

—1 

0 

0 








0 

0 

0 

0 

3.7083 

— 1.0833 

-1 

0 


1 

.25 

0" 


"0 

— 1 

-1" 

0 

0 

0 

0 

0 

3.3919 

-.2921 

—1 

1. 

0 

1 

0 

3. 

1 

0 

2 

0 

.0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

3.7052 

0 

- 1.0861 

3 . 3868 . 


0 

0 

1 


0 

0 

1 


U 


b. x = 

(3.9569,6.5885,4.2392,7.3971，5.6029,8.7608, 9.4115,12.0431) 


c. A~ 


_ .2953 

.0866 

•0945 

■0509 

.0318 

.0227 

.0100 

.0082' 

.0866 

.2953 

•0509 

.0945 

.0227 

.0318 

.0082 

.0100 

■0945 

.0509 

.3271 

.1093 

.1045 

.0591 

.0318 

.0227 

•0509 

•0945 

.1093 

.3271 

•0591 

.1045 

.0227 

.0318 

.0318 

.0227 

.1045 

.0591 

.3271 

.1093 

.0945 

.0509 

.0227 

•0318 

.0591 

.1045 

• 1093 

.3271 

■0509 

•0945 

.0100 

.0082 

.0318 

•0227 

.0945 

.0509 

.2953 

.0866 

..0082 

.0100 

.0227 

.0318 

.0509 

•0945 

.0866 

.2953 - 


5. 


7. 


1/V2 1/V2 O' 

1/V2 -1/V2 0 

0 0 1 


1/2 -V3/2 3 + 4vT 

V5/2 1/2 4-3>/3 

0 0 1 

See the Practice Problem. 


Obtain A~ 1 directly and then compute A~ l — U~ l L~ l 
to compare the two methods for inverting a matrix. 
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9. A(BD) requires 800 multiplications. (AB)D requires 408 
multiplications. The first method uses about twice as many 
multiplications. If D had 10,000 columns, the counts would 
be 80,000 and 40,008, respectively. 


C 


3. x 


.10 .60 
.30 .20 

.30 .10 

44.44 一 
16.67 
16.67 


.60 

0 

.10 

5. x 


j intermediate I _ 
， j demand | 


60 

20 

10 


11 . Use the fact that 
sec (p — tan (p sin<^ : 


sin 2 




COS (p cos p 


110 

120 


13. 


cos 妒 

.First apply the linear 
transformation A, and then translate by p. 


\4 p' 


■I P" 

"A 0" 

_0 r 1 


0 r 1_ 

0 r 1_ 
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15. (12,-6,-3) 


17. 



1 

0 

0 

O' 


0 

1/2 

-V3/2 

0 

25. Basis for Col A 

0 

V3/2 

1/2 

0 


0 

0 

0 

1 _ 



19. The triangle with vertices at (7,2,0), (7.5,5,0), and 
(5,5,0) 



" 2.2586 

-1.0395 

-.3473 " 

"Z" 


~ R~ 

21. [M] 

-1.3495 

2.3441 

.0696 

Y 

= 

G 


•0910 

-.3046 

1.2777 

Z 


B 


Section 2.8, page 151 



2 " 


"-7" 


-2.5 


.5 

Basis for Nul A: 

1 

, 

0 


0 


-4 


0 


1 


27. Construct a nonzero 3x3 matrix A, and construct b to be 
almost any convenient linear combination of the columns of 
A. 


3. 

5. 

7. 


9. 

11 . 


13. 

15. 


17. 


19. 


21 . 


23 . 


The set is closed under sums but not under multiplication 
by a negative scalar. (Sketch an example.) 

The set is not closed under sums or scalar multiples. 

Yes. The system corresponding to [vi V 2 w] is 
consistent. 

a. The three vectors Vi, V 2 , and V 3 

b. Infinitely many vectors 

c. Yes, because Ax = p has a solution. 

No, because i4p 一 0. 

p = 4 and q = 3. Nul ^4 is a subspace of R 4 because 
solutions of Ax = 0 must have four entries, to match the 
columns of A. Col ^4 is a subspace of R 3 because each 
column vector has three entries. 


For Nul A, choose (1, —2, 1 ， 0) or (—1,4,0,1), for 
example. For Col A, select any column of A. 

Let A be the matrix whose columns are the vectors given. 
Then A is invertible because its determinant is nonzero, and 
so its columns form a basis for R 2 , by the IMT (or by 
Example 5). (Other reasons for the invertibility of A could 
be given.) 

Let A be the matrix whose columns are the vectors given. 
Row reduction shows three pivots, so A is invertible. By 
the IMT, the columns of A form a basis for R 3 . 

Let A be the 3x2 matrix whose columns are the vectors 
given. The columns of A cannot possibly span M . 3 because 
A cannot have a pivot in every row. So the columns are not 
a basis for R 3 . (They are a basis for a plane in R 3 .) 

Read the section carefully, and write your answers before 
checking the Study Guide. This section has terms and key 
concepts that you must learn now before going on. 



29. Hint: You need a nonzero matrix whose columns are 
linearly dependent. 

31. If Col F 7 ^ R 5 , then the columns of F do not span M 5 . 

Since F is square, the IMT shows that F is not invertible 
and the equation Fx = 0 has a nontrivial solution. That is, 
Nul F contains a nonzero vector. Another way to describe 
this is to write Nul F ^ {0}. 

33. If Nul C = {0}, then the equation Cx = 0 has only the 
trivial solution. Since C is square, the IMT shows that C is 
invertible and the equation Cx = b has a solution for each 
b in R 6 . Also, each solution is unique, by Theorem 5 in 
Section 2.2. 

35. If Nul B contains nonzero vectors, then the equation 

Bx = 0 has nontrivial solutions. Since B is square, the IMT 
shows that B is not invertible and the columns of B do not 
span R 5 . So Col B is a. subspace of R 5 , but Col B R 5 . 

37. [M] Display the reduced echelon form of A, and select the 
pivot columns of ^4 as a basis for Col A. For Nul A, write 
the solution of Zx = 0 in parametric vector form. 



3" 



-5" 



Basis for Col A: 

-7 

-5 

, 


9 

7 




3 



-7 




[-2.51 


r 4.5] 


[-3.51 


-1.5 


2.5 


-1.5 

Basis for Nul A: 



, 

0 


, 

0 


0 


1 



0 


0 


0 



1 


Section 2.9, page 157 
1 . x = 3bi + 2 b 2 
■4 


3. 


2 


5. 


9. Basis for Col A: 




2 " 

-1 

7. 

[w] B = 

1 


2 


—6 

3 


1 


5 

2 

1 

-1 

, 

9 

5 


0 


14 


[x]. 


1.5 

.5 


,dim Col A 
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Basis for Nul A: 


11. Basis for Col A: 


Basis for Nul A: 


;dim Nul A = 1 


dim Col ^ = 2 


m. F n. T o. F p. T 

3 . / 

5. A 2 = 2A — I. Multiply by ^4: A 3 = 2A 2 — A. 
Substitute A 2 = 2A — I: 

A 3 = 2(2 A - I)-A = 3A-2I. 

Multiply by A again: A 4 = ^4(3^4 — 21) = 3A 2 — 2A. 
Substitute the identity A 2 = 2A — I again: 

A 4 = 3(2A -I)-2A = 4A-3I. 


-2 


-1 


-1 


「10 -ll 



1 


0 


0 

7. 

9 10 

9. 

-3 13 

0 

, 

0 


-1 

;dimNuM = 3 

-5 -3 


-8 27 

0 


1 


0 





0 


0 


1 

11. a. p(Xi) = c 0 + CiXi + 

...+ C n -\X\ 


13. The vectors Vi, V 3 , and V 4 form a basis for the given 
subspace, H • So, dimension H = 3. 

15. Col A = R 4 , because A has a pivot in each row, and so the 
columns of A span IR 4 . Nul A cannot equal R 2 , because 
Nul A is 3. subspace of R 6 . It is true, however, that Nul A is 
two-dimensional. Reason: The equation Ax = 0 has two 
free variables, because A has six columns and only four of 
them are pivot columns. 

17. See the Study Guide after you write your justifications. 

19. The fact that the solution space of Ax = 0 has a basis of 
three vectors means that dim Nul A = 3. Since a 5 x 7 
matrix A has seven columns, the Rank Theorem shows that 
rank A = 1 — dim Nul A = A. See the Study Guide for a 
justification that does not explicitly mention the Rank 
Theorem. 

21. A 9 x 8 matrix has eight columns. By the Rank Theorem, 
dim Nul ^4 = 8 — rank A. Since the rank is 7 ， 
dim Nul ^4 = 1. That is, the dimension of the solution space 
of ^4x = 0 is 1 . 

23. Create a 3 x 5 matrix A with two pivot columns. The 
remaining three columns will correspond to free variables 
in the equation Ax = 0. So the desired construction is 
possible. 

25. The p columns of A span Col A by definition. If 

dim Col A = p, then the spanning set of p columns is 
automatically a basis for Col A, by the Basis Theorem. In 
particular, the columns are linearly independent. 

27. a. Hint: The columns of B span IV, and each vector a 7 is 
in W. The vector c y is in because B has p columns. 

b. Hint: What is the size of C? 

c. Hint: How are B and C related to A1 

29. [M] Your calculations should show that the matrix 
[Vi \2 x ] corresponds to a consistent system. The 
^-coordinate vector of x is ( 2 ,- 1 ). 

Chapter 2 Supplementary Exercises, page 160 


a. T 
g. T 


b. F 
h. T 


c. T 
i. T 


d. F 
j. F 


e. F 
k. T 


f. F 

1. F 


: row, (F) ■ 


co 


Cn — 1 , 


row,(Kc) = yi 


b. 


13. a. 

b. 

c. 


Suppose X\,... ,x n are distinct, and suppose Fc = 0 for 
some vector c. Then the entries in c are the coefficients 
of a polynomial whose value is zero at the distinct 
points Xi,... ,x n . However, a nonzero polynomial of 
degree n — l cannot have n zeros, so the polynomial 
must be identically zero. That is, the entries in c must 
all be zero. This shows that the columns of V are 
linearly independent. 

Hint: When x\,... ,x n are distinct, there is a vector c 
such that Vc = y. Why? 

P 2 = (uu T )(uu r ) = u(u r u)u r = u(l)u r 


P 


P T 

Q 2 - 


P 


{xm T ) T = n TT \x T = ui 
(I -2P)(I -2P) 

I - I(2P)-2PI +2P(2P) 

/ — 4P + 4P 2 = /, because of part (a). 


15. Left-multiplication by an elementary matrix produces an 
elementary row operation: 

B 〜 E\B 〜 EiE\B ~ E^E2E\B = C 

So B is row equivalent to C. Since row operations are 
reversible, C is row equivalent to B. (Alternatively, show 
C being changed into B by row operations using the 
inverses of the E{.) 

17. Since 5 is 4 x 6 (with more columns than rows), its six 
columns are linearly dependent and there is a nonzero x 
such that 5x = 0. Thus ABx = AO = 0, which shows that 
the matrix AB is not invertible, by the Invertible Matrix 
Theorem. 

19. [M] To four decimal places, as k increases, 


A k 


B k 


.2857 

.2857 

.2857 

.4286 

.4286 

•4286 

.2857 

.2857 

.2857 

.2022 

.2022 

.2022 

.3708 

.3708 

.3708 

.4270 

.4270 

•4270 


and 


























Section 3.3 A31 


or, in rational format. 


Chapter 3 

Section 3.1, page 167 

1. 1 3. -5 5. -23 7. 4 

9. 10. Start with row 3. 

11. —12. Start with column 1 or row 4. 

13. 6. Start with row 2 or column 2. 

15. 1 17. -5 

19. ad — be, cb — da. Interchanging two rows changes the 
sign of the determinant. 

21. —2, (18 + \2k) — (20 + \2k) = —2. A row replacement 
does not change the value of a determinant. 

23. —5, k{A) — k(2) + k(—l) = —5k. Scaling a row by a 
constant k multiplies the determinant by k. 

25. 1 27. k 29. -1 

31. 1. The matrix is upper or lower triangular, with only Vs on 
the diagonal. The determinant is 1, the product of the 
diagonal entries. 


33. det EA = det 


(det _E)(deti4) 


35. det EA = det 


a kc 
c 


cb — ad = — be) 


b -\-kd 
d 


: (a + kc)d — (b kd)c 
: ad + ked — be — kdc = {- \-\)(ad — be) 
: (det E)(det A) 


37. 5^ 


;no 


39. Hints are in the Study Guide. 

41. The area of the parallelogram and the determinant of 


[u v ] both equal 6. If y ： 


2 


for any x, the area is still 


6. In each case the base of the parallelogram is unchanged, 
and the altitude remains 2 because the second coordinate of 
y is always 2. 

43. [M] In general, det(^4 + B) is not equal to det A + det B. 

45. [M] You can check your conjectures when you get to 
Section 3.2. 


15 5 

20 10 


Section 3.2, page 175 

1. Interchanging two rows reverses the sign of the 
determinant. 

3. A row replacement operation does not change the 
determinant. 


9. 3 11. 120 

17. -7 19. 14 

23. Not invertible 
27. See the Study Guide. 


5. 3 7. 0 

13. 6 15. 35 

21. Invertible 

25. Linearly 
independent 

29. -32 

31. Hint: Show that (det i4)(det ^4 —! ) = 1. 

33. Hint: Use Theorem 6. 

35. Hint: Use Theorem 6 and another theorem. 

37. det^5 = det ^ ^ = 24; 

(det (det 5) = 3 • 8 = 24 

39. a. -12 b. 500 c. -3 d. \ e. 64 

41. det A = (a -e)d — (b f)c = ad ed — be — fc 
=(ad — be) + (ed — fc) = det B + det C 

43. Hint: Compute det ^4 by a cofactor expansion down column 
3. 

45. [M] See the Study Guide after you have made a conjecture 
about A T A and AA T . 


Section 3.3, page 184 


5/6 

- 1/6 


3. 


1 • s + 士 

9. 5 0, —1; X\ 


4 

5/2 


55 + 4 
6(s 2 - 3) 

1 


5. 


， x 2 


3(s + 1) 


, x 2 : 


' 3/2' 

4 

■ 一 7/2_ 
-4^-15 
4(s 2 - 3) 
4 ^ + 3 
6s(s + 1) 




■ 0 

1 

0" 


1 

0 

1 

0" 

11. 

adj A = 

-3 - 

1 

-3 

,A~ 

-1 = 丄 
— 

-3 

-1 

-3 



3 

2 

6_ 


J 

3 

2 

6_ 



"-1 - 

1 

5" 


1 

"-1 

-1 

5" 

13. 

adj A = 

1 - 

5 

1 

,A~ 

-1 _ 1 
一 6 

1 

-5 

1 



1 

7 

-5_ 


1 

7 

-5_ 



2 

0 

0" 


1 

2 

0 

0" 

15. 

adj A = 

2 

6 

0 

,A~ 

-l _ 1 
一 6 

2 

6 

0 



-1 - 

9 

3 


-1 

-9 

3_ 

17. 

If A = 


a b 

c d 

,then Cn = 

d ， C\2 

= —c, C 21 = 

=—b 


Cn = a. The adjugate matrix is the transpose of cofactors: 
adj^ 


d 

-c 


8 8 8 
/ / / 
8 3 8 
13 3 


7 7 7 
/ / / 

2 3 2 9 9 9 
8 8 8 
/ / / 

8 3 8 
7 7 7 13 3 
/ / / 

2 3 2 

9 9 9 
8 8 8 

777/// 
/ / / 8 3 8 














































A32 Answers to Odd-Numbered Exercises 


Following Theorem 8 , we divide by det A; this produces the 
formula from Section 2.2. 

19. 8 21. 14 23. 22 

25. A 3 x 3 matrix A is not invertible if and only if its columns 
are linearly dependent (by the Invertible Matrix Theorem). 

This happens if and only if one of the columns is in the 
plane spanned by the other two columns, which is 
equivalent to the condition that the parallelepiped 
determined by these columns has zero volume, which in 
turn is equivalent to the condition that det ^4 = 0. 

27. 24 29. i|det[vi v 2 ]| 11 - 

31. a. See Example 5. b. A7tabc/?> 

33. [M] In MATLAB, the entries in B — inv(^4) are 

approximately 10 — 15 or smaller. See the Study Guide for 
suggestions that may save you keystrokes as you work. 13. 

35. [M] MATLAB Student Version 4.0 uses 57,771 flops for 
inv(^4), and 14,269,045 flops for the inverse formula. The 
inv( A) command requires only about 0.4% of the 
operations for the inverse formula. The Study Guide shows 
how to use the flops command. 

Chapter 3 Supplementary Exercises, page 185 


The solution for Exercise 3 is based on the fact that if a matrix 
contains two rows (or two columns) that are multiples of each 
other, then the determinant of the matrix is zero, by Theorem 4, 
because the matrix cannot be invertible. 

3. Make two row replacement operations, and then factor out a 
common multiple in row 2 and a common multiple in row 3. 


1 

a 

b c 


1 a 

b - 

h C 

1 

b 

a c 

= 

0 b — a 

a - 

- b 

1 

c 

a b 


0 c — a 

Q, - 

- c 


5. -12 


(b — a)(c — a) 


19. 


7. When the determinant is expanded by cofactors of the first 
row, the equation has the form ax by c = 0, where at 
least one of a and b is not zero. This is the equation of a 
line. It is clear that (xi, yi) and (X 2 , 3 ^ 2 ) are on the line, 
because when the coordinates of one of the points are 
substituted for x and y, two rows of the matrix are equal 
and so the determinant is zero. 


.Thus, by Theorem 3, 


det T = (b — a)(c — a) det 


(b — a)(c — a) det 


I a a 
0 1 b a 

0 0 c-b 


=(b — a)(c — a)(c — b) 

Area = 12. If one vertex is subtracted from all four 
vertices, and if the new vertices are 0, Vi, V 2 , and V 3 , then 
the translated figure (and hence the original figure) will be a 
parallelogram if and only if one of Vi, \ 2 , and ¥3 is the sum 
of the other two vectors. 


By the Inverse Formula, (adj A) _ 


det A 


-A = A- 1 A = /. 


By the Invertible Matrix Theorem, adj A is invertible and 


(adj A)~ 


det^4 


-A. 


a. X = CA~ l , Y = D — CA~ l B. Now use Exercise 
14(c). 

b. From part (a), and the multiplicative property of 
determinants, 


det 


A 

C 


B 

D 


dtt[A{D -CA~ { B)] 
d&t[ AD - AC A~ l B] 

=d&t[ AD-CAA~ l B] 

=det [ AD — CB ] 

where the equality AC = CA was used in the third step. 

First consider the case n = 2, and prove that the result 
holds by directly computing the determinants of B and C. 
Now assume that the formula holds for all 
(A: — 1) x (k — 1) matrices, and let A, B, and C be k x k 
matrices. Use a cofactor expansion along the first column 
and the inductive hypothesis to find det B. Use row 
replacement operations on C to create zeros below the first 
pivot and produce a triangular matrix. Find the determinant 
of this matrix and add it to det B to get the result. 

[M] Compute: 



1 

1 

1 

1 

一 i 

1 

2 

2 

2 

— i, 

1 

2 

3 

3 


1 

2 

3 

4 


1 1 1 

2 2 2 

3 3 3 

3 4 4 

3 4 5 


0 1 b a 

0 1 c a 


0 b-a b 2 -a 2 
0 c — a c 2 - a 2 


1 a b c 
0 1 -1 

0 1 -1 


F F 
f* L 
FT 
e.k. 

F F T 
d..J-p- 
F F F 
c..1.0. 
T T T 
b.h.n. 
T T F 
a.g-m. 
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Conjecture: 

1 1 1 ... 1 

12 2 2 

12 3 3 


1 2 3 … n 

To confirm the conjecture, use row replacement operations 
to create zeros below the first pivot, then the second pivot, 
and so on. The resulting matrix is 

1 1 1 

0 1 1 

0 0 1 


0 0 0 

which is an upper triangular matrix with determinant 1 . 



Chapter 4 

Section 4.1, page 195 

1. a. u + v is in K because its entries will both be 
nonnegative. 

_ 2 - 


b. Example: If u = 
cu is not in V. 


2 


and c = —1, then u is in V, but 


and c = 4, then u is in H, but cu is 


3. Example: If u = 
not in H. 

5. Yes, by Theorem 1, because the set is Span {t 2 }. 

7. No, the set is not closed under multiplication by scalars that 
are not integers. 

_ - 2 ' 


9. H = Span {y}, where v = 
subspace of R 3 . 

11. W = Span {u, v}, where u 


5 


.By Theorem 1, // is a 


■By 


2 " 


"3" 

-1 

,v = 

0 

0 


2 


Theorem 1, is a subspace of R 3 . 

13 . a. There are only three vectors in {vi, V 2 , V3}, and w is not 
one of them. 

b. There are infinitely many vectors in Span {vi,V2, V3}. 

c. w is in Span {vi, V2, v 3 } because w = Vi + v 2 . 

15 . W is not a vector space because the zero vector is not in W. 


17. S 



2 


-1 


0 



0 


3 


-1 



-1 

5 

0 


3 



0 


3 


0 

, 


19. Hint: Use Theorem 1. 


Warning: Although the Study Guide has complete solutions for 
every odd-numbered exercise whose answer here is only a 
“Hint,” you must really try to work the solution yourself. 
Otherwise, you will not benefit from the exercise. 


21. Yes. The conditions for a subspace are obviously satisfied: 
The zero matrix is in H, the sum of two upper triangular 
matrices is upper triangular, and any scalar multiple of an 
upper triangular matrix is again upper triangular. 


23. See the Study Guide after you have written your answers. 

25. 4 27. a. 8 b. 3 c. 5 d. 4 

29. u + (—l)u = lu + (—l)u Axiom 10 

=[1 + (—l)]u Axiom 8 


= Ou = 0 Exercise 27 


From Exercise 26, it follows that (— l)u = —u. 


31 . Any subspace H that contains u and y must also contain all 
scalar multiples of u and y and hence must contain all sums 
of scalar multiples of u and y. Thus H must contain 
Span {u, v}. 


33. Hint: For part of the solution, consider Wi and W 2 in 

H K, and write Wi and W 2 in the form wi = U! + Vi and 
W 2 = U 2 + V 2 , where Ui and 112 are in H, and Vi and \2 are 
in K. 


35. [M] The reduced echelon form of [vi, ¥ 2 , V 3 ， w] shows 
that w = Vi — 2 v2 + V3. Hence w is in the subspace 
spanned by Vi, V 2 , and ¥ 3 . 

37. [M] The functions are cos At and cos 6t. See Exercise 34 in 
Section 4.5. 


Section 4.2, page 205 


3 

-5 

-3" 

_ r 


"0" 

6 

-2 

0 

3 

二 

0 

-8 

4 

1 

-4 


0 


so w is in Nul A. 


3. 


5 . 


7. W is not a subspace of R 3 because the zero vector (0,0,0) 
is not in W. 


9. W is a. subspace of R 4 because W is the set of solutions of 
the system 

p — 3q — As =0 
2p — s — 5r = 0 

11. W is not a subspace because 0 is not in W. Justification: If 
a typical element (s — 2/, 3 + 3s, 3s + t, 2s) were zero, 
then 3 + = 0 and 2s = 0, which is impossible. 





































A34 Answers to Odd-Numbered Exercises 


The matrix has a pivot in each row and hence its columns 
span M 3 . 

7. This set does not form a basis for R 3 . The set is linearly 
independent because one vector is not a multiple of the 
other. However, the vectors do not span R 3 . The matrix 
"-2 6 " 

3 —1 can have at most two pivots since it has only 
_ 0 5_ 

two columns. So there will not be a pivot in each row. 


13. Basis for Nul A: 


Basis for Col A: 


-5/2 


-3/2 


c. Part (b) showed that the range of T contains all B such 
that B T = B. So it suffices to show that any B in the 
range of T has this property. If B = T(A), then by 
properties of transposes, 

— 04 + A t ) t = A t A tt = A t A = B 


B 1 


d. The kernel of T is 


b real >. 


35. Hint: Check the three conditions for a subspace. Typical 
elements of T(U) have the form T (ui) and T ( 112 )，where 
Ui, U 2 are in U. 

37. [M] w is in Col A but not in Nul A. (Explain why.) 

39. [M] The reduced echelon form of A is 

10 1/3 0 10/3 _ 

0 1 1/3 0 -26/3 

0 0 0 1 -4 

0 0 0 0 0 


Section 4.3, page 213 


The 3x3 matrix A 


3. 


5. 


1 1 1 

Oil has three pivot 
0 0 1 _ 

positions. By the Invertible Matrix Theorem, A is invertible 
and its columns form a basis for M 3 . (See Example 3.) 

This set does not form a basis for R 3 . The set is linearly 
dependent and does not span R 3 . 

This set does not form a basis for R 3 . The set is linearly 
dependent because the zero vector is in the set. However, 


3 

-3 

0 

0" 


"1 

—1 

0 

0" 

-3 

7 

0 

-3 

〜 

0 

4 

0 

-3 

0 

0 

0 

5 


0 

0 

0 

5 


r(cp) 


cp ⑼ 


P(0) 

_cp(l)_ 

— c 

_p ⑴- 


cr(p) 


So r is a linear transformation from P 2 into P 2 . 
b. Any quadratic polynomial that vanishes at 0 and 1 must 
be a multiple of p(f) = t(t — 1). The range of T is R 2 . 

33. a. For A, B in M 2 X 2 and any scalar c, 

T(A + B) = {A + B) + {A + B) T 

=A + B + A T + B t Transpose property 
=(^ + A T ) + (5 + B T ) = T(A) + T{B) 
T{cA) = (cA) + (cA) T =cA + cA T 
=c(A + A T ) = cT{A) 

So 7" is a linear transformation from M 2 X 2 into M 2 X 2 . 
b. If B is any element in M 2 X 2 with the property that 


B 1 = B, and if A = ^B, then 


T(A) = \B+aB) 7 


5 + 5 


B 


13. W = Col A for A 


Theorem 3. 

0 2 1 

1 -1 2 

3 1 0 

2 -1 -1 


so VK is a vector space by 


15. 


17. a. 2 


b. 4 


19. a. 5 


b. 2 


21. The vector 


is in Nul A and the vector 


Col A. Other answers are possible. 

23. w is in both Nul A and Col A. 

25. See the Study Guide. By now you should know how to use 
it properly. 


27. Let x : 


3" 


1 

-3 

-3" 

2 

and A = 

-2 

4 

2 

-1 


-1 

5 

7 


.Then x is in 


Nul A. Since Nul 4 is a subspace of R 3 , lOx is in Nul A. 

29. a. ^40 = 0, so the zero vector is in Col A. 

b. By a property of matrix multiplication, 

Ax + Aw = A(x- {- w), which shows that Ax + Aw is a 
linear combination of the columns of A and hence is in 
Col A 

c. c(Ax) = A{cx), which shows that c{An) is in Col A for 
all scalars c. 

31. a. For arbitrary polynomials p, q in P 2 and any scalar c, 

r(p + q) 


(p + q)(0) 

(p + q ) ⑴ 

P(0) 

p ⑴ 



_ p ⑼ + q(o) _ 


_p(i) + q(i)_ 


q(0) 

q(l) 


T(P) + T(q) 


2 0 1 


3 10 



ox- II 11 _^u 


9. 




















































Section 4.5 A35 


15. {vi,y 2 ,V4,v 5 } 17. [M] {vi,V 2 ,v 3 ,V 5 } 

19. The three simplest answers are {vi, V 2 } ， {vi ， V 3 }, and 
{v 2 , V 3 }. Other answers are possible. 

21. See the Study Guide for hints. 

23. Hint: Use the Invertible Matrix Theorem. 

25. No. (Why is the set not a basis for HI) 

27. {cosa)^, sina)^} 

29. Let A be the n x k matrix [ Vi ... \k ]. Since A has 

fewer columns than rows, there cannot be a pivot position in 
each row of A. By Theorem 4 in Section 1.4, the columns 
of A do not span M. n and hence are not a basis for R n . 

31. Hint: If {vi ， ... ， v p } is linearly dependent, then there exist 

C\,... ,c p , not all zero, such that CiVi +- h c p \ p = 0. 

Use this equation. 

33. Neither polynomial is a multiple of the other polynomial, so 
{p L , p 2 } is a linearly independent set in P 3 . 

35. Let {vi, V 3 } be any linearly independent set in the vector 
space V, and let V 2 and V 4 be linear combinations of Vi and 
v 3 . Then {vi,V 3 } is a basis for Span{v!, v 2 , v 3 , v 4 }. 

37. [M] You could be clever and find special values of t that 
produce several zeros in (5), and thereby create a system of 
equations that can be solved easily by hand. Or, you could 
use values of t such as t = 0 , . 1 , . 2 ,... to create a system of 
equations that you can solve with a matrix program. 


Section 4.4, page 222 


3' 

_-7_ 

3. 

"-7" 

4 

3 

5. 

2 ' 

-1 

7. 

"-1 " 
-1 

3 

_ 1 
-3 

2 " 

-5_ 

11 . 

_5_ 

_ 1 _ 

13. 

2 " 

6 

-1 



15. The Study Guide has hints. 

17. j 二 5vi — 2 v 2 = 10vi - 3 v 2 + ¥3 = —y 2 — V 3 
(infinitely many answers) 


19. Hint: By hypothesis, the zero vector has a unique 

representation as a linear combination of elements of S. 

9 2' 


21 . 


4 


23. Hint: Suppose that [u]g = [w]b for some u and w in V, and 
denote the entries in [u]b by ci, ..., Use the definition 
of [u] B . 

25. One possible approach: First, show that if ui ， ... ， u# are 
linearly dependent, then [ui]b, ..., [u 尸 ] g are linearly 
dependent. Second, show that if [ui]g,..., [u^]^ are 
linearly dependent, then Ui,..., are linearly dependent. 
Use the two equations displayed in the exercise. A slightly 
different proof is given in the Study Guide. 


27. Linearly independent. (Justify answers to Exercises 27-34.) 
29. Linearly dependent. 


31. a. The coordinate vectors 

1 

-3 


-3 

5 


-4 

5 


5 


-7 


—6 


0 do not span R 3 . Because of the isomorphism 


between R . 3 and P 2 , the corresponding polynomials do 
not span P 2 . 


b. The coordinate vectors 

0 

5 


1 

-8 


-3 

4 


2 

-3 


1 


-2 


2 


0 


span M 3 . Because of the isomorphism between JR 3 and 
P 2 , the corresponding polynomials span P 2 . 


33. [M] The coordinate vectors 


3 


5 


0 

7 


1 


1 

0 


0 


-2 

0 


-2 


0 


16 

—6 

2 


are a linearly dependent subset of R 4 . Because of 


the isomorphism between M . 4 and P 3 , the corresponding 
polynomials form a linearly dependent subset of P 3 , and 
thus cannot be a basis for P 3 . 


35. 


[M] [x] B = 


-5/3 

8/3 


37. [M] 


1.3 

0 

0.8 


Section 4.5, page 229 

- 2 ' 


0 


;dim is 2 


3. 


5. 


0 


0 


2 

1 


-1 


0 

0 

， 

1 


-3 

1 


2 


0 

1 

-2 

0 


dim is 3 


dim is 3 


7. No basis; dim is 0 


9. 2 


11. 3 13. 2 ,： 


15. 2,3 17. 0,3 

19. See the Study Guide. 

21. Hint: You need only show that the first four Hermite 
polynomials are linearly independent. Why? 

23. [p] B = (3,6,2, 1) 

25. Hint: Suppose S does span V, and use the Spanning Set 
Theorem. This leads to a contradiction, which shows that 
the spanning hypothesis is false. 




































































A36 Answers to Odd-Numbered Exercises 


27. Hint: Use the fact that each P„ is a subspace of P. 

29. Justify each answer. 

a. True b. True c. True 

31. Hint: Since H is a nonzero subspace of a finite-dimensional 
space, H is finite-dimensional and has a basis, say, 

Vi,... ,\ p . First show that {7(vi),... ， r(y p )} spans T(H). 

33. [M] a. One basis is {vi, V 2 , V 3 , e 2 , e〗}. In fact, any two of 
the vectors e 2 ,..., es will extend {vi, V 2 , V 3 } to a basis of 
R 5 . 

Section 4.6, page 236 
1. rank ^4 = 2; dimNul A = 2; 



1 " 


"- 4 " 


Basis for Col A: 

-1 

, 

2 



5 


-6 


Basis for Row A: 

(1,0,-1,5),(0, -2,5 


'r 


'-5' 


Basis for Nul A: 

5/2 

1 

, 

-3 

0 



0 


1 _ 


rank^4 = 3; dimNul A = 

3 ； 



2 


6 


3 

Basis for Col A: 

-2 

4 

, 

-3 

9 

, 

0 

3 


-2 


3 


3 


Basis for Row A: (2,6, —6,6,3,6), (0, 3,0, 3,3,0), 
(0,0,0,0,3,0) 



5. 4,3,3 

7. Yes; no. Since Col ^4 is a four-dimensional subspace of R 4 , 
it coincides with R 4 . The null space cannot be M 3 , because 
the vectors in Nul A have 7 entries. Nul ^4 is a 
three-dimensional subspace of R 7 , by the Rank Theorem. 

9. 3, no. Notice that the columns of a 4 x 6 matrix are in R 4 , 
rather than R 3 . Col ^4 is a three-dimensional subspace of 
R 4 . 

11 . 2 

13. 5, 5. In both cases, the number of pivots cannot exceed the 
number of columns or the number of rows. 

15. 4 17. See the Study Guide. 

19. Yes. Try to write an explanation before you consult the 
Study Guide. 

21. No. Explain why. 

23. Yes. Only six homogeneous linear equations are necessary. 


25. No. Explain why. 

27. Row A and Nul A are in R n ; Col A and Nul A T are in R m . 
There are only four distinct subspaces because 
Row^4 r = Col A and Co\A T = Row A. 

29. Recall that dim Col A = m precisely when Col A = R m , or 
equivalently, when the equation = b is consistent for all 

b. By Exercise 28(b), dim Col A = m precisely when 
dimNul A T = 0, or equivalently, when the equation 
A t x = 0 has only the trivial solution. 



2a 

2b 

2c" 


31. uv r = 

—3a 

-3b 

—3c 

.The columns are all 


5a 

5b 

5c 



multiples of u, so Col uv r is one-dimensional, unless 
a = b = c = 0. 


33. Hint: Let ^4 = [u 112 113 ]. If u _ 0, then u is a basis for 
Col A Why? 

35. [M] Hint: See Exercise 28 and the remarks before Example 
4. 

37. [M] The matrices C and R given for Exercise 35 work 
here, and A = CR. 


Section 4.7, page 242 


1. a. 

6 

-2 

9" 

-4 


b. 

0 

3. (ii) 

4 

-1 

0" 


"8" 

5. a. 

-1 

1 

1 

b. 

2 


0 

1 

-2 


2 


P . 

■-3 r 

P . 

"-2 r 

C—B — 

—5 2_ 

, B<-C — 

-5 3_ 


p . 

'2 3' 

P . 

1 

■ 1 

3" 

c^b — 

0-1 

> - 

2 

_0 

-2_ 


11. See the Study Guide. 


13. C Z B = 

13 0" 

-2 -5 2 

， [ — 1 + 2t]t3 — 

5" 

-2 


1 4 3 


1 


15. a. Bis 3. basis for V. 

b. The coordinate mapping is a linear transformation. 

c. The product of a matrix and a vector 

d. The coordinate vector of y relative to B 

17. a. [M] 




"32 0 

16 

0 

12 

0 

10 



32 

0 

24 

0 

20 

0 


1 


16 

0 

16 

0 

15 

尸 -1 




8 

0 

10 

0 


~ 32 




4 

0 

6 







2 

0 

1 













































Section 4.9 A37 


b. cos 2 1 = (1/2) [1 + cos 2t] 
cos 3 1 = (1 /4) [3 cos t + cos 3t] 
cos 4 1 = (1/8) [3 + 4cos2^ + cos 4t] 
cos 5 1 = (l /16)[10cos t 5 cos 3t + cos 5t] 

cos 6 1 = (1/32) [10 + 15 cos 2t + 6 cos At + cos 6t] 


independent. Since dim H = 2, the signals form a basis for 
H, by the Basis Theorem. 

7. Yes 9. Yes 

11. No, two signals cannot span the three-dimensional solution 
space. 


19. [M] Hint: Let C be the basis {vi, V 2 , V3}. Then the columns 
of P are [ui]c ， [U 2 ]c, and [ 113 ]。. Use the definition of 
C-coordinate vectors and matrix algebra to compute Ui, 112 , 
U3. The solution method is discussed in the Study Guide. 
Here are the numerical answers: 


a. ui = 

—6 

-5 

, U2 = 

—6 

-9 


U 3 = 

"-5" 

0 



_ 21_ 


_ 32 _ 




3_ 



" 28" 


38" 


" 21" 

b. Wi = 

-9 

,W 2 = 

-13 

, w 3 = 


7 


-3 


2 



3_ 


i3. d) k ,(i) k is. c,r A -i) k 

17. Y k = ci(.8)* + c 2 (.5) k + 10 ^ 10 ask 4 00 

19. y k = ci (-2 + V3) k + c 2 (-2 - 

21. 7, 5,4, 3,4, 5, 6, 6, 7, 8, 9, 8, 7; see figure: 



\~k 


Section 4.8, page 251 


23. a. yk-\-\ — 101^= —450, yo = 10,000 


1. If yk = 2 k , then 外 +i = 2 k+l and 外 +2 = 2 k+1 . 

Substituting these formulas into the left side of the equation 
gives 

y k +i + 2y k+l - Sy k = 2 k + 2 + 2 - 2 k+l -S-2 k 
= 2 k (2 2 - \-2-2-S) 

= 2 k (0) = 0 for all k 

Since the difference equation holds for all k, 2 k is a 
solution. A similar calculation works for yk = (—4) fc . 

3. The signals 2 k and (—4) fc are linearly independent because 
neither is a multiple of the other. For instance, there is no 
scalar c such that 2 k = c(—4) k for all L By Theorem 17 ， 
the solution set H of the difference equation in Exercise 1 
is two-dimensional. By the Basis Theorem in Section 4.5, 
the two linearly independent signals 2 k and (—4) fc form a 
basis for H • 


25. Ci • ( —4 )灸 + 〔2 27. - 2 + 众 + Ci . 2 灸 + C 2 . ( — 2)^ 

29. x/c+i = Ax/c, where 

yk 

j-t+i 
外+3_ 

31. The equation holds for all k, so it holds with k replaced by 
A: — 1, which transforms the equation into 

yk +2 + 5^+1 + 6 外 = 0 for all 众 


0 10 0 
0 0 10 
0 0 0 1 
2-6 8-3 


The equation is of order 2. 

33. For all k, the Casorati matrix C(k) is not invertible. In this 
case, the Casorati matrix gives no information about the 
linear independence/dependence of the set of signals. In 
fact, neither signal is a multiple of the other, so they are 
linearly independent. 


5. If y k = (—2) k , then 

y k +2 + 4^ +1 + 4y k = (~2) k + 2 + 4(-2)^+' + 4(-2/ 
=(—2”[(—2) 2 + 4(—2)+ 4] 

=(-2/(0) = 0 for all k 

Similarly, if = k(—2) k , then 


yk +2 + 4^+1 + 4y k 

=(k + 2)(-2) k+2 + 4(k + 1)(-2 产 +1 + 4k(-2) k 
=(~2) k [{k + 2)(-2) 2 + 4(k + 1)(-2) + 4k] 

=(~2) k [4k + 8 - 8A: - 8 + 4A:] 

=(-2)*(0) for all k 

Thus both (—2)^ and k(—2) k are in the solution space H of 
the difference equation. Also, there is no scalar c such that 
k(—2) k = c (—2) k for all k, because c must be chosen 
independently of k. Likewise, there is no scalar c such that 
(—2) k = ck(—2) k for all k. So the two signals are linearly 


35. Hint: Verify the two properties that define a linear 
transformation. For and {zk} in §, study 
T ({j/t} + {z：^}). Note that if r is any scalar, then the kih 
term of r{yj c } is ry^ \ so r(r{j^}) is the sequence {wk} 
given by 

= ry k+2 + a{ry k+l ) + b(ry k ) 

37. Hint: Find TD(y 0 , y u y 2 ,...) and DT(y 0 , y u y 2 , •. •)• 


Section 4.9, page 260 


1. a. 


3. a. 


From: 

N M 

.7 .6" 

.3 .4_ 

From: 

H I 

.95 .45" 

.05 .55 


To: 

News 

Music 

To: 

Healthy 

Ill 


b. ^ c. 33% 


b. 15%, 12.5% 


























A38 Answers to Odd-Numbered Exercises 


c. .925; use Xo 


5. 


5/14. 

9/14 


7. 


0 _. 

1/4 _ 
1/2 
1/4 


9. Yes, because P 2 has all positive entries. 
’2/3 — 


11. a. 


13. a. 


1/3 

.9 


b. 2/3 


b. .10, no 


15. [M] About 13.9% of the United States population 

17. a. The entries in a column of P sum to 1. A column in the 


7. You would have to know that the solution set of the 
homogeneous system is spanned by two solutions. In this 
case, the null space of the 18 x 20 coefficient matrix A is at 
most two-dimensional. By the Rank Theorem, 
dim Col A > 20 — 2 = 18, which means that Col^4 = R 18 , 
because A has 18 rows, and every equation Ax = b is 
consistent. 

9. Let A be the standard m x n matrix of the transformation T. 
a. If T is one-to-one, then the columns of A are linearly 
independent (Theorem 12 in Section 1.9), so 
dim Nul /I = 0. By the Rank Theorem, 
dim Col A = rank A = n. Since the range of T is Col A, 
the dimension of the range of T is n. 


matrix P — I has the same entries as in P except that 
one of the entries is decreased by 1. Hence each column 
sum is 0. 


b. If T is onto, then the columns of A span (Theorem 
12 in Section 1.9), so dim Col A = m.By the Rank 
Theorem, dim Nul A = n — dim Col A = n — m. Since 


b. By part (a), the bottom row of P — / is the negative of 
the sum of the other rows. 


the kernel of T is Nul A, the dimension of the kernel of 
T is n — m. 


c. By part (b) and the Spanning Set Theorem, the bottom 
row of P — I can be removed and the remaining 

(n — 1) rows will still span the row space. 

Alternatively, use part (a) and the fact that row 
operations do not change the row space. Let A be the 
matrix obtained from P — / by adding to the bottom 
row all the other rows. By part (a), the row space is 
spanned by the first {n — 1) rows of A. 

d. By the Rank Theorem and part (c), the dimension of the 
column space of 尸 一 / is less than n, and hence the null 
space is nontrivial. Instead of the Rank Theorem, you 
may use the Invertible Matrix Theorem, since 尸 一 / is 
a square matrix. 

19. a. The product Sx equals the sum of the entries in x. For a 
probability vector, this sum must be 1. 

b. P = [ pj p 2 ... p n ], where the p, are probability 
vectors. By matrix multiplication and part (a), 

SP = [5P! 5P2 ••- Sp„] = [1 1 ... 1] = 5 

c. By part (b), S(Px) = (SP)x = Sx = l. Also, the 
entries in Px are nonnegative (because P and x have 
nonnegative entries). Hence, by (a), Px is a probability 
vector. 

Chapter 4 Supplementary Exercises, page 262 


a. 

T 

b. 

T 

c. 

F 

d. 

F 

e. 

T 

f. 

T 

g. 

F 

h. 

F 

i. 

T 

j- 

F 

k. 

F 

1. 

F 

m. 

T 

n. 

F 

0 . 

T 

P. 

T 

q- 

F 

r. 

T 

s. 

T 

t. 

F 










3. The set of all (厶 1 , 办 2 , 厶 3 ) satisfying b\ + 2^2 + 心 = 0. 

5. The vector p l is not zero and p 2 is not a multiple of p 1? so 
keep both of these vectors. Since p 3 = 2pj + 2p 2 , discard 
p 3 . Since p 4 has a t 2 term, it cannot be a linear combination 
of pj and p 2 , so keep p 4 . Finally, p 5 = pj + p 4 , so discard 
p 5 . The resulting basis is {p l5 p 2 , p 4 }. 


11. If 5 is a finite spanning set for V, then a subset of S —say 
•S’ 一 is a basis for V. Since must span V, S f cannot be a 
proper subset of S because of the minimality of S. Thus 

= S, which proves that S is a. basis for V. 

12. a. Hint: Any y in Col AB has the form y = ABx for some 

x. 

13. By Exercise 9, rank PA < rank A, and 
rank^4 = rank P~ l PA < rank PA. Thus 
rank PA = rank A 

15. The equation AB = 0 shows that each column of B is in 
Nul A. Since Nul d is a subspace, all linear combinations of 
the columns of B are in Nul A, so Col 5 is a subspace of 
Nul A. By Theorem 11 in Section 4.5, 
dim Col B < dim Nul A. Applying the Rank Theorem, we 
find that 


n = rank A + dim Nul A > rank A + rank B 


17. a. Let A\ consist of the r pivot columns in A. The 

columns of Ai are linearly independent. So Ai is an 
m x r submatrix with rankr. 

b. By the Rank Theorem applied to Ai, the dimension of 
Row A is r, so Ai has r linearly independent rows. Use 
them to form A 2 . Then A 2 IS r x r with linearly 
independent rows. By the Invertible Matrix Theorem, 
A 2 is invertible. 

"0 1 0 " 

19. [ B AB A 2 B] = 1 -.9 .81 

1 .5 .25 


1 -.9 .81 

〜 0 1 0 
_0 0 -.56 _ 

This matrix has rank 3, so the pair (^4, B) is controllable. 


21. [M] rank [5 AB A 2 B A 3 B] = 3. The pair (yl,5)is 
not controllable. 
















Section 5.2 A39 


Chapter 5 

Section 5.1, page 271 

1. Yes 3. Yes, A = —2 5. Yes, X = —5 


7. 


Yes, 1 



11. A = -1: 3 2 ; A = 7: 



"0" 


"-1" 


"-1" 

13. A = 1: 

1 

;X = 2: 

2 

;A = 3: 

1 


0 


2 


1 


15. 1,0 17. 0,3,-2 

oj [ i_ 

19. 0. Justify your answer. 

21. See the Study Guide, after you have written your answers. 
23. Hint: Use Theorem 2. 

25. Hint: Use the equation Ax = Ax to find an equation 
involving A~ l . 


27. Hint: For any A, (A — XI) T = A T — A/. By a theorem 
(which one?), A T — XI is invertible if and only if A — XI is 
invertible. 


29. Let y be the vector in W 1 whose entries are all l’s. Then 
= s\. 


31. Hint: If A is the standard matrix of T, look for a nonzero 
vector y (a point in the plane) such that A\ = y. 

33. a. x fc +i = c\X k+l \x + c 2 fi k+1 \ 
b. Axk = A(c\X k u + C 2 fi k \) 

=C\X k Au + C 2 lji k A\ Linearity 
= C[X k Xu + C 2 lji k u and v are eigenvectors. 

= Xk+l 




"-1" 


"-3" 


"2" 

37. [M] A = 5: 

-1 

;A = 10: 

2 

;A = 15: 

2 


2 


1 


1 


39. [M] A 


A = 12: 


-4: 


"0" 


"6" 


"0" 

1 


3 


0 

2 

;A = —8: 

3 

, 

-1 

0 


2 


0 

_ 1_ 


0 


_1_ 


0 " 


" 2 " 

0 


1 

-1 

, 

2 

1 


0 

0 


1 


Section 5.2, page 279 

1. A 2 - 4A - 45; 9, -5 3. A 2 - 3A - 40; -5, 8 

5. A 2 - 16A + 48; 4, 12 
7. A 2 _ 9A + 32; no real eigenvalues 
9. -A 3 + 10A 2 - 33A + 36 11. -A 3 + 8 A 2 - 19A + 12 

13. -A 3 + 18A 2 -95A + 150 15. 2, 3, 5, 5 

17. 3,3, 1,1,0 

19. Hint: The equation given holds for all A. 

21. The Study Guide has hints. 

23. Hint: Find an invertible matrix P such that RQ = P~ l AP. 

r -i 1 

25. a. {vi, V 2 }, where y 2 = ^ is an eigenvector for A = .3 

b. x 0 = Vi - ^v 2 

c. X L = Vi - ^(.3)y 2 ,x 2 = Vi - n(. 3 ) 2 V2, and 

x k =\i — As A: — 00 , （ .3)* — 0 and 

Xk 

27. a. A\\ = Vi, A \2 = . 5 y 2 , A \3 = . 2 V 3 . (This also shows 
that the eigenvalues of A are 1, .5, and .2.) 

b. {vi, V 2 , ¥ 3 } is linearly independent because the 
eigenvectors correspond to distinct eigenvalues 
(Theorem 2). Since there are 3 vectors in the set, the set 
is a basis for R 3 . So there exist (unique) constants such 
that 

X 0 = CiVi + c 2 v 2 + C 3 y 3 
Then 

wx 0 = CiW r Vi + c 2 w r y 2 + c 3 w r V 3 (*) 

Since Xo and Vi are probability vectors and since the 
entries in V 2 and in V 3 each sum to 0 , (*) shows that 

1 = Cl. 

c. By part (b), 

Xo = Vi + C 2 V 2 + c 3 \3 
Using part (a), 

Xk = A k x 0 = A k \i + c 2 A k \ 2 + C 3 A k \ 3 
=Vi + C 2 (.5 ) 〜 2 + c 3 (.2) k \ 3 
—Vi as A: — 00 










































A40 Answers to Odd-Numbered Exercises 


5. a. 9 — 3t 1 2 1 3 

b. For any p, q in IP 2 and any scalar c, 

^[p(0 + q(0] = 0 + 3)[p(0 + q(0] 

= 0 + 3)p(0 + (? + 3)q(0 
=r[p(0] + T[q(t)] 

r[c. p(r)] = (r + 3)[c- p(r)] = c • (r + 3)p(r) 
=c-7[p(0] 


21. See the Study Guide. 23. Yes. (Explain why.) 

25. No, A must be diagonalizable. (Explain why.) 

27. Hint: Write A = PDP~ l . Since A is invertible, 0 is not an 
eigenvalue of A, so D has nonzero entries on its diagonal. 

1 1 " 


3 0 0 
1 3 0 
0 1 3 
0 0 1 


7. 


Section 5.4, page 293 


-1 0 

6 4 


3. a. 7"(ei) = b 3 , = —bi — 2b2, T(es) = 2bi + 3b3 


b. [r( ei )] B 
[T(e 3 )] B 


[T(e 2 )] r 


0-1 2 
0-2 0 
1 0 3 


35. [M] P 


D 


2 1 -3 


1 


10 0 10 

0100-1 
0 10 10 

0 0 2 0 1 


0 0 7 0 0 

0 0 0 -14 0 

0 0 0 0 -14 


-1 2 1 

13. P = -1-1 0 

1 0 1 


15. P = 1 1 0 

-1 0 1 

17. Not diagonalizable 


29. [M] Report your results and conclusions. You can avoid 
tedious calculations if you use the program gauss 
discussed in the Study Guide. 

Section 5.3, page 286 


" 226 -525 ■ 

3. 

_ a k 

0 " 

90 -209 

2(a k - b k ) 

b k 


29. One answer is P\ 


whose columns are 


-2 —1 

eigenvectors corresponding to the eigenvalues in D\. 
31. Hint: Construct a suitable 2x2 triangular matrix. 


9. a. 


b. Hint: Compute r(p + q) and 7"(c • p) for arbitrary p, q 


33. [M] P 


D 


"2 

0 

-1 

1 " 


- i 

i 

i 

7 

-1 

0 

-4 


i 

_ 上 

i 

0 

7 

0 

, c. 

1 

0 

2 

0 

1 

i 

1 _ 

0 

1 

0 

3 


i 

i 


-12 

0 


0 0 

-12 0 


0 0 13 0 

0 0 0 13 


11 . 


-2 -2 

0 -1 


13. h 


^ t>2 


15. bi 


b 2 



When an answer involves a diagonalization, A = PDP~ l , the 
factors P and D are not unique, so your answer may differ from 
that given here. 


7. P 


0 


D 


9. Not diagonalizable 


11. P 


"1 

-1 

-1 ' 


"5 

0 

0" 

2 

1 

0 

,D = 

0 

-1 

0 

3 

0 

1 


0 

0 

-1 



5 

0 

0 

,D = 

0 

1 

0 


_0 

0 

1 


"0 

0 

0" 

,D = 

0 

1 

0 


0 

0 

1 


19. P 


1 

3 

-1 

—1 


5 

0 

0 

0 

0 

2 

-1 

2 

,D = 

0 

3 

0 

0 

0 

0 

1 

0 

0 

0 

2 

0 

0 

0 

0 

1 


0 

0 

0 

2 


001 

0 2 4 
I 

3 5 0 




































































Section 5.6 A41 


17. a. Abi = 3bi, so bi is an eigenvector of A. However, A 
has only one eigenvalue, A = 3, and the eigenspace is 
only one-dimensional, so A is not diagonalizable. 


b. 


0 3 


19. By definition, if A is similar to B, there exists an invertible 
matrix P such that P~ l AP = B. (See Section 5.2.) Then 
B is invertible because it is the product of invertible 
matrices. To show that A~ l is similar to B~ l , use the 
equation P~ l AP = B. See the Study Guide. 

21. Hint: Review Practice Problem 2. 

23. Hint: Compute 

25. Hint: Write A = PBP~ l = (PB)P~ 1 , and use the trace 
property. 

27. For each j, /(b y ) = by. Since the standard coordinate 
vector of any vector in M" is just the vector itself, 

[/(by)]f ： = by. Thus the matrix for / relative to B and the 
standard basis S is simply [ bi b 2 ... b ；i ]. This matrix 
is precisely the change-of-coordinates matrix Pb defined in 
Section 4.4. 

29. The 谷 -matrix for the identity transformation is I n , because 
the ^-coordinate vector of the yth basis vector b y is the jth 
column of I n . 

-7 -2 -6" 


31. [M] 


-4 -6 
0 -1 


Section 5.5, page 300 
1 . A = 2 + /,[ _1 + r 

3. A = 3 + 2/, 


5. A = 4 + /, 


2 


X = 2 — i 

A = 3 — 2/ 
A = 4 — /, 


-1 + - 
4 


2 


7. A = V3 士 /, p = 丌 /6 radian, r = 2 
9. A = 士 2/, 史 = _ 丌 /2 radians, r = 2 
11. A = — 士 /, 炉 =— 5 丌 /6 radian, r = 2 

In Exercises 13-20, other answers are possible. Any P that 
makes P~ l AP equal to the given C or to C r is a satisfactory 
answer. First find P; then compute P~ 1 AP. 


13. P 

15. P 

17. P 

19. P 


C 


C 

c 

c 


2 

1 3 

-3 1 

-3 4 

-4 -3 

.96 -.28 
.28 .96 


21 . 


y : 


2 

-1 +2/ 

'-2 - 4/" 

-1 +2 / _ 

5 

5 


23. a. Properties of conjugates and the fact that x T = x r ; 

b. Ax = Ax and A is real; (c) because x T Ax is a scalar and 
hence may be viewed as a 1 x 1 matrix; (d) properties 
of transposes; (e) A T = A, definition of q 

25. Hint: First write x = Rex + /(Imx). 

-1 1-1 -1 — 

0-102 
10 0-2 
0 0 2 0 


27. [M] P 


C 


0 

0 

-4 

-10 


0 

0 

10 

-4 


Other choices are possible, but C must equal P~ l AP. 

Section 5.6, page 309 

1. a. Hint: Find C\, C 2 such that Xq = CiVi + Use this 
representation and the fact that Vi and \2 are 

eigenvectors of A to compute x! = . 

b. In general, x k = 5(3) k \i — 4(|) fc v 2 for A: > 0. 

3. When p = .2, the eigenvalues of A are .9 and .7, and 

' 2 ' 


= c x (.9) k 


+ c 2 (.l) k 


0 2LS k — OO 


The higher predation rate cuts down the owls’ food supply, 
and eventually both predator and prey populations perish. 

5. If p = .325, the eigenvalues are 1.05 and .55. Since 
1.05 > 1, both populations will grow at 5% per year. An 
eigenvector for 1.05 is (6,13), so eventually there will be 
approximately 6 spotted owls to every 13 (thousand) flying 
squirrels. 

7. a. The origin is a saddle point because A has one 

eigenvalue larger than 1 and one smaller than 1 (in 
absolute value). 

b. The direction of greatest attraction is given by the 
eigenvector corresponding to the eigenvalue 1/3, 
namely, V 2 . All vectors that are multiples of \2 are 
attracted to the origin. The direction of greatest 
repulsion is given by the eigenvector Vi. All multiples 
of Vi are repelled. 

c. See the Study Guide. 

9. Saddle point; eigenvalues: 2, .5; direction of greatest 
repulsion: the line through (0,0) and (—1,1); direction of 
greatest attraction: the line through (0,0) and (1,4) 

11. Attractor; eigenvalues: .9, .8; greatest attraction: line 
through (0,0) and (5,4) 

13. Repeller; eigenvalues: 1.2, 1.1; greatest repulsion: line 
through (0,0) and (3,4) 

















































A42 Answers to Odd-Numbered Exercises 



2 " 


"-1 " 

15. X* = vi + 

-3 

1 

+ .3(.2)* 

0 

1 


Vi as 


k — oo 

17. a. A = 


0 

.3 


1.6 

.8 


b. The population is growing because the largest 
eigenvalue of A is 1.2, which is larger than 1 in 
magnitude. The eventual growth rate is 1.2, which is 
20% per year. The eigenvector (4,3) for X\ = 1.2 
shows that there will be 4 juveniles for every 3 adults. 

c. [M] The juvenile-adult ratio seems to stabilize after 
about 5 or 6 years. The Study Guide describes how to 
construct a matrix program to generate a data matrix 
whose columns list the numbers of juveniles and adults 
each year. Graphing the data is also discussed. 

Section 5.7, page 317 


x(0 = 



3 

" — 

r 

e - 

— 2 


1 


3. 


2 


-3 




e—\ The origin is a saddle point. 


The direction of greatest attraction is the line through 
(—1,1) and the origin. The direction of greatest repulsion is 
the line through (—3,1) and the origin. 


5. 


7. SetP 


"r 


'r 

_3_ 

^ + 2 

_ i_ 


^ ^ ^ 1 o 1 e 6 ’. The origin is a repeller. The 

direction of greatest repulsion is the line through ( 1 ， 1 ) and 
the origin. 

~ "4 O' 

0 6 

A = PDP~ l . Substituting x = Py into x’ = Ax, we have 
d 


and D 


.Then 


dt 


(Py) = A{Py) 


PY = PDP~ l (Py) = PDy 
Left-multiplying by P~ l gives 

y f = Dy, or 

9. (complex solution): 

Cl 




■4 

0 " 




0 

6 

_ yiit) _ 


e (- 2 +i)t ^ 


o( — 2—0^ 


(real solution): 
cos t sin t 
cos t 


Cl 


e~ 2t + c 2 


sin? — cos t 
sin t 


The trajectories spiral in toward the origin. 
11 . (complex): ci 
(real): 
ci 


'-3 + 3 / ' 

■Xit , 

'-3 - 3 / " 

2 

e Mt + c 2 

2 


-3 cos 3t — 3 sin 3? 
2 cos 3? 


- c 2 


-3 sin 3t + 3 cos 3t 
2 sin 3t 


The trajectories are ellipses about the origin. 


13. (complex): c\ 
(real): C\ 


1 + / 

2 

cos 3t — sin 3t 
2 cos 3? 


(l+3/)f 


+ ^2 


e l + c 2 


0 (l-3i> 


sin 3t + cos 3t 
2 sin 3t 


The trajectories spiral out, away from the origin. 


15. [M] x(0 = d 

"-1" 

0 

e~ 2t + c 2 

—6 

1 

+ c 3 

"-4" 

1 


1 


5 


4 


The origin is a saddle point. A solution with C 3 = 0 is 
attracted to the origin. A solution with C\ = C 2 = 0 is 
repelled. 

17. [M] (complex): 

-3' 


Cl 



" 23 - 34 / " 

e l + c 2 

-9+14/ 

3 


e (5-\-2i)t 



L 23-f 

34/ _ 



C3 

-9 - 

-14/ 

e (5-2i)t 




「 Z 
-3 


23 cos 2, + 34sin2f 

(real): c\ 

1 

e l + c 2 

—9cos2? — 14 sin 2? 



1 


3cos2? 


23 sin 2t — 34 cos 2t 
c 3 —9sin2f + 14cos2r 

3 sin 2t 

The origin is a repeller. The trajectories spiral outward, 
away from the origin. 

-2 3/4' 


19. [M] A 


'vi(0' 

5 

_r 

^~- 5t — — 

'-3" 

- _ 

一 2 

_2_ 

2 

_ 2_ 


21. [M] A 


' iUt)' 


—20 sin 6 t 

.vcit) _ 


15 cos 6 t — 5sin6? 


Section 5.8, page 324 

1. Eigenvector: X 4 
A ^ 4.9978 
3. Eigenvector: X 4 

A ^ .9075 

c 「 -.7999 
5. x = 

estimated X = 

7. [M] 

X k : 


"1 ' 

_ .3326 _ 

, or AX 4 = 

'.5188' 

, or Ax^ = 

1 


4.9978 

1.6652 

.4594 

.9075 


,Ax = 
-5.0020 


4.0015 

-5.0020 


•75 


1 


•9932 


1 


1 

' 

.9565 

' 

1 


.9990 

1 


.9998 


li k : 11.5 ， 12.78, 12.96, 12.9948, 12.9990 

9. [M] jis = 8.4233, = 8.4246; actual value: 8.42443 

(accurate to 5 places) 










































































































Chapter 5 Supplementary Exercises A43 


11. [i k \ 5.8000, 5.9655, 5.9942, 5.9990 {k = 1,2, 3,4); 
R(x k ): 5.9655, 5.9990, 5.99997, 5.9999993 

13. Yes, but the sequences may converge very slowly. 


for each diagonal entry of D, because these entries in D are 
the eigenvalues of A. Thus p(D) is the zero matrix. Thus 
p(A) = P.0 • 尸 _1 = 0. 


15. Hint: Write Ax — olx = (A — al)x, and use the fact that 
(A — al) is invertible when a is not an eigenvalue of A. 

17. [M] Vq = 3.3384, V\ = 3.32119 (accurate to 4 places with 
rounding), V 2 = 3.3212209. Actual value: 3.3212201 
(accurate to 7 places) 


9. If I — A were not invertible, then the equation 

(I — v4)x = 0 would have a nontrivial solution x. Then 
x — Ax = 0 and Ax = 1 • x, which shows that A would 
have 1 as an eigenvalue. This cannot happen if all the 
eigenvalues are less than 1 in magnitude. So I — A must be 
invertible. 


19. [M] a. /X 6 = 30.2887 = ^ to four decimal places. To six 
places, the largest eigenvalue is 30.288685, with 
eigenvector (.957629, .688937,1, .943782). 
b. The inverse power method (with a = 0) produces 
/xr 1 = .010141, /x ^ -1 = .010150. To seven places, the 
smallest eigenvalue is .0101500, with eigenvector 
(-.603972,1,-.251135,.148953). The reason for the 
rapid convergence is that the next-to-smallest 
eigenvalue is near .85. 

21. a. If the eigenvalues of A are all less than 1 in magnitude, 
and if x _ 0, then A k x is approximately an eigenvector 
for large k. 

b. If the strictly dominant eigenvalue is 1, and if x has a 
component in the direction of the corresponding 
eigenvector, then {fx} will converge to a multiple of 
that eigenvector. 

c. If the eigenvalues of A are all greater than 1 in 
magnitude, and if x is not an eigenvector, then the 
distance from A k x to the nearest eigenvector will 
increase as A: — oo. 


Chapter 5 Supplementary Exercises, page 326 


a. 

T 

b. 

F 

c. 

T 

d. 

F 

e. 

T 

f. 

T 

g. 

F 

h. 

T 

i. 

F 

j- 

T 

k. 

F 

1. 

F 

m. 

F 

n. 

T 

0 . 

F 

P. 

T 

q. 

F 

r. 

T 

s. 

F 

t. 

T 

u. 

T 

V. 

T 

w. 

F 

X. 

T 


3. a. Suppose Ax = Ax, with x 一 0. Then 

(51 — A)x = 5x — Ax = 5x — Xx = (5 — A)x. 
The eigenvalue is 5 — A. 


11. a. Take x in H. Then x = cu for some scalar c. So 

Ax = A(c\x) = c(^4u) = c(Au) = (cA)u, which shows 
that Ax is in H. 

b. Let x be a nonzero vector in K. Since K is 
one-dimensional, K must be the set of all scalar 
multiples of x. If K is invariant under A, then Ax is in 
K and hence Ax is a multiple of x. Thus x is an 
eigenvector of A. 

13. 1,3,7 

15. Replace a by a — A in the determinant formula from 

Exercise 16 in Chapter 3 Supplementary Exercises: 

dtt(A — XI) = (a — b — A) n— 1 [a — X (n — l)b] 


This determinant is zero only if a — 办一 A = 0 or 
a — X (n — l)b = 0. Thus A is an eigenvalue of A if and 
only ifX = a — borX = a-\-(n— \)b. From the formula 
for det(v4 — XI) above, the algebraic multiplicity is n — l 
for a — b and 1 for a {n — \)b. 

17. dct(v4 _ A/) = — A)(^22 _ 义）一 ^12^21 = 

久 2 — (^11 + 022 ) 久 + (ail«22 — ^ 12^21 ) = 

A 2 — (tr A)X + det^4. Use the quadratic formula to solve 
the characteristic equation: 


tr 乂士 ^{\x A) 2 - 


The eigenvalues are both real if and only if the discriminant 
is nonnegative, that is, (tr A) 2 — 4det A>0. This inequality 


simplifies to (tr A) 1 > 4 det A and 


V 2 ) 


> det A. 


b. (51 — 3A A 2 )x = 5x — 3Ax + A(Ax) 

= 5x — 3Ax + A 2 x 
=(5 - 3A + A 2 )x. 

The eigenvalue is 5 — 3A + A 2 . 

5. Suppose Ax = Ax, with x _ 0. Then 

p(A)x = (c 0 I + C\A + c 2 A 2 + … + c n A n )x 

= c 0 x + c\Ax + c 2 A 2 x H - h c n A n x 

=cox + ci 久 x + C 2 义 2 x + •. • + c n X n x = p(X)x 

So p(A) is an eigenvalue of the matrix p(A). 

7. If ^4 = PDP~ l , then p(A) = Pp(D)P~ l , as shown in 
Exercise 6 . If the (j, j) entry in Z) is A, then the (j, j) 
entry in D k is X k , and so the (y, j) entry in p(D) is p(X) 
If p is the characteristic polynomial of A, then p(X) = 0 


19. C p 


5 ; det(C p - A/) = 6 - 5A + A 2 = p{X) 


21. If is a polynomial of order 2, then a calculation such as in 
Exercise 19 shows that the characteristic polynomial of C p 
is p(X) = (_ l) 2 /?( 久 )， so the result is true for n = 2. 
Suppose the result is true for n = k for some k > 2, and 
consider a polynomial p of degree k Then, expanding 
det(C p — XI) by cofactors down the first column, the 
determinant of C p — XI equals 


- X 1 ... 0 


(_A) det 

0 

_ —a\ —a 2 


+ (-l) k+l a 0 


_ cik — A 
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The k xk matrix shown is C q — A/, where 

q(t) = + a 2 t + .. • + akt k ~ x + t k . By the induction 

assumption, the determinant of C q — XI is (—\) k q(X). Thus 

d e t(c p - 义 /) = (-l) k+ 1 a 0 + 

— ( 一 1 ) 灸十心 。 + A(ai + ... + cik^ 1 + A^)] 

= (~l) k+ 1 p(X) 

So the formula holds for n = k \ when it holds for 
n = k. By the principle of induction, the formula for 
det(C / , — XI) is true for all n > 2. 

23. From Exercise 22, the columns of the Vandermonde matrix 
V are eigenvectors of C p , corresponding to the eigenvalues 
Ai, 久 2 , 久 3 (the roots of the polynomial p). Since these 
eigenvalues are distinct, the eigenvectors form a linearly 
independent set, by Theorem 2 in Section 5.1. Thus V has 
linearly independent columns and hence is invertible, by the 
Invertible Matrix Theorem. Finally, since the columns of V 
are eigenvectors of C p , the Diagonalization Theorem 
(Theorem5 in Section 5.3) shows that V~ l C p V is diagonal. 

25. [M] If your matrix program computes eigenvalues and 
eigenvectors by iterative methods rather than symbolic 
calculations, you may have some difficulties. You should 
find that AP — PD has extremely small entries and 
PDP~ l is close to A. (This was true just a few years ago, 
but the situation could change as matrix programs continue 
to improve.) If you constructed P from the program’s 
eigenvectors, check the condition number of P . This may 
indicate that you do not really have three linearly 
independent eigenvectors. 


Chapter 6 

Section 6.1，page 336 

8/13 
12/13 



- r ~ 



7 . 拉 9. 

—.0 

• 8 _ 

11 . 

2/V^69 

_4/v/69_ 


13. 5V^5 15. Not orthogonal 17. Orthogonal 

19. Refer to the Study Guide after you have written your 
answers. 

21. Hint: Use Theorems 3 and 2 from Section 2.1. 

23. u.v = 0, ||u || 2 = 30, ||v || 2 = 101, 

||u + v || 2 = (—5 ) 2 + (-9 ) 2 + 5 2 = 131 = 30+ 101 

一 b 

25. The set of all multiples of ^ (when v 一 0) 

27. Hint: Use the definition of orthogonality. 

29. Hint: Consider a typical vector w = CiVi + • • • + c p \ p in 
W. 


5,8, 


3. 


3/35 

-1/35 

-1/7 


31. Hint: If x is in then x is orthogonal to every vector in 

W. 

33. [M] State your conjecture and verify it algebraically. 


Section 6.2, page 344 

1. Not orthogonal 3. Not orthogonal 5. Orthogonal 

7. Show Ui -U 2 = 0, mention Theorem 4, and observe that two 
linearly independent vectors in R 2 form a basis. Then 
obtain 


+ 


52 


9. Show ui *U 2 = 0, Uj *U 3 = 0, and U 2 *U 3 = 0. Mention 
Theorem 4, and observe that three linearly independent 
vectors in R 3 form a basis. Then obtain 


5U1 


i u 2 - 


fu 3 


9U1 - ^U 2 + 2u 3 


'- 2 ' 

1 飞 v — 

'-4/5' 

_i_ 

'14/5" 

1 


7/5 _ 

十 

_ 8/5 _ 


11 . 


15. y-y 


distance is 1 



'1/V3- 

"- 1 /V 2 ' 

17. 

1/V3 

0 



_ 1 /V 2 . 


19. Orthonormal 


21. Orthonormal 


23. See the Study Guide. 

25. Hint: ||f/x || 2 = (Ux) T (Ux). Also, parts (a) and (c) follow 
from (b). 

27. Hint: You need two theorems, one of which applies only to 
square matrices. 

29. Hint: If you have a candidate for an inverse, you can check 
to see whether the candidate works. 

y-u 


31. Suppose y 

y. ㈣ 

(cu). (cu) 


-u. Replace u by cu with c 一 0; then 


(cu) 


e(y.u) 


(c)u = y 


33. Let L = Span {u}, where u is nonzero, and let 
T (x) = proj L x. By definition, 

T(x) = = (x.u)(u.u)-lu 

u-u 

For x and y in R” and any scalars c and d, properties of the 
inner product (Theorem 1) show that 

T(cx + dy) = [(cx + Jy)-u](u-u) _1 u 

=[c(x-u) + ^(y*u)](u*u) _1 u 
=c(x*u)(u*u) -1 u + J(y*u)(u*u) _1 u 
= cT(x) + dT(y) 


Thus T is linear. 
































Section 6.5 A45 


15. V40 


l not 


17. See the Study Guide. 


3. a. 


5. x ： 


9. a. b : 


6 

6 

X\ 


6 

_6 

42 

_x 2 _ 


—6 


b. x 


4/3 

-1/3 


5' 


"-1 " 

-3 

+ X 3 

1 

0 


1 


b. x 


7. 2V5 


2/7 

1/7 



3' 

1 


"2/3" 

a. b = 

4 

-1 _ 

b. x = 

0 



_l/3_ 


19. Any multiple of 


21. Write your answers before checking the Study Guide. 


23. 


Seel 


5. 


9. 


Section 6 . 3 , page 352 
1. X = -§Ui - |u 2 + |u 3 + 2 u 4 ; X = 



"- 1 " 


"-1 " 

3. 

4 

5. 

2 


0 


6 



11 


7 

Au = 

-11 

, A\ = 

-12 


11 


7 



0 " 


4" 

b — Au = 

2 

,b — ^4v = 

3 


—6 


-2 


.No, u coulc 

possibly be a least-squares solution of Ax = b. Why? 


2 


2 

4 

,(UU T )y = 

4 

5 


5 


2 


2 

4 


-1 

0 

+ 

3 

0 


-1 


Hint: Use Theorem 3 and the Orthogonal Decomposition 
Theorem. For the uniqueness, suppose Ap = b and 

= b, and consider the equations p = Pi + (p — Pi) 
and p = p + 0 . 


ion 6.4, page 358 


3. 


3/2 

3/2 


7. 


2/^30' 

-5/V30 

l/\/30. 


2/V6' 

1/V6 

1/V6. 


11 . 


1 


3 


2 

-1 


0 


0 

-1 

, 

3 

, 

2 

1 


-3 


2 

1 


3 


-2 


17. See the Study Guide. 

19. Suppose x satisfies Rx = 0; then QRx = 00 = 0, and 
Ax = 0. Since the columns of A are linearly independent, x 
must be zero. This fact, in turn, shows that the columns of 
R are linearly independent. Since R is square, it is 
invertible, by the Invertible Matrix Theorem. 

21. Denote the columns of 2 by q t ,..., q„. Note that n < m, 
because A is mx and has linearly independent columns. 
Use the fact that the columns of Q can be extended to an 
orthonormal basis for say, {q 1? ..., q m }. (The Study 
Guide describes one method.) Let 2o = [ q n +i … q m ] 
and Q X = \Q Q 0 ]. Then, using partitioned matrix 

multiplication, Q\ ^ = QR = A. 

23. Hint: Partition i? as a 2 x 2 block matrix. 

25. [M] The diagonal entries of R are 20, 6, 10.3923, and 
7.0711, to four decimal places. 

Section 6.5, page 366 



6 

- 11 " 

Xi 


'-4" 


'3" 

a. 




— 


b. x = 



-11 

22 

x 2 


11 


2 


13. R 


6 

12 


4 

0 

6 

15. x = 

-1 



3 


-1 

11 . 

-1 

13. 

-3 

-2 


1 



-1 


3 


17. a. U T U = p ° , UU T 
b. proj^ y = 6 ui + 3u 2 = 


10/3 

7. y = 2/3 + 

8/3 


-7/3 

7/3 9. y = 

7/3 


0 


0 

2/5 

,such as 

2 

_l/5 


1 


9 9 9 
/ / / 
2 4 5 


9 9 9 
/ / / 
2 5 4 


9 9 9 
/ / / 
8 2 2 


2 2 2 2 
/ o / / / 

li 11 11 11 


0//2 


/ 


^V5^V5V5 
/ / / / / 

11 11 11 11 11 


75 

4 

V5 

I 


2 

15 . 




0 6 2 
1 _ I 


+ 


0 2 4 2 
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19. a. If Ax = 0, then A T Ax = A T 0 = 0. This shows that 
Nul A is contained in Nul A T A. 
b. If A t Ax = 0, then x T A T Ax = x r 0 = 0. So 

(^4x) r (i4x) = 0 (which means that ||^4x|| 2 = 0 )， and 
hence Ax = 0. This shows that Nul A T A is contained in 
Nul A. 


21. Hint: For part (a), use an important theorem from Chapter 
2 . 


23. 


25. 


By Theorem 14, b = ^4x = A{A T A)~ X A T b. The matrix 
A(A T A)~ l A T occurs frequently in statistics, where it is 


sometimes called the hat-matrix. 

. r 2 

The normal equations are - 


6 

6 


,whose 


solution is the set of (x, y) such that x y = 3. The 
solutions correspond to points on the line midway between 
the lines x y = 2 and x y = A. 


Section 6.6, page 374 


1. y = .9 Ax 3. ^ = 1.1 + 1.3x 


5. If two data points have different x-coordinates, then the two 
columns of the design matrix X cannot be multiples of each 
other and hence are linearly independent. By Theorem 14 
in Section 6.5, the normal equations have a unique solution. 


7. a. y = Xp e, where y 


1.8 


1 

1 

2.7 


2 

4 

3.4 

,X = 

3 

9 

3.8 


4 

16 

3.9 


5 

25 


p 、 

fh 


€1 

^3 

^4 


_^5 _ 

b. [M] 3 ； = \.16x - 20x 2 



" 7.9" 


cos 1 

sin 1 

9. y = Xp e, where y = 

5.4 

= 

cos 2 

sin 2 


-.9 


cos 3 

sin 3 


'A' 


"^1 ~ 

B 

,(= 

^2 

L 」 


_ 63 _ 



11. [M] P = 1.45 and e = .811; the orbit is an ellipse. The 
equation r = 0/(1 — e • cos produces r = 1.33 when 
^ = 4.6. 


13. [M] a. j = —.8558 + 4.7025r + 5.5554r 2 - .02740 
b. The velocity function is 

v{t) = 4.7025 + 11.1108? - .0822r 2 , and 
u(4.5) = 53.0 ft/sec. 

15. Hint: Write X and y as in equation (1), and compute X T X 
and X T y. 

17. a. The mean of the x-data is x = 5.5. The data in 

mean-deviation form are (—3.5,1), (—.5,2), (1.5, 3), 


19. 


b. 


and (2.5, 3). The columns of X are orthogonal because 
the entries in the second column sum to 0. 


'4 

0" 

'Po' 


9 " 

_0 

21 _ 

Jl. 


• 1.5 ■ 


J = I + = I + n — 5.5) 

Hint: The equation has a nice geometric interpretation. 


Section 6.7, page 382 

1. a. 3, Vl05, 225 b. All multiples of : 

3. 28 5. 5^2, 3V3 7. § + 

9. a. Constant polynomial, p(t) = 5 

b. t 2 — 5 is orthogonal to po and p\, values: 

(4, —4, —4,4); answer: q(t) = \(t 2 — 5) 

11 . l it 

13. Verify each of the four axioms. For instance: 

1 . (u,y) = (^4u).(Av) Definition 

=(^4v)» (i4u) Property of the dot product 
=(y, u) Definition 


15. (u, c\) = (c\, u) Axiom 1 
二 c(y, u) Axiom 3 
二 c(u, y) Axiom 1 

17. Hint: Compute 4 times the right-hand side. 


(u,v) = ^/aVb + Vb^/a = 2\fab, 

||u|| 2 = (v^) 2 + {Vb ) 2 = a + b. Since a and b are 
nonnegative, ||u|| = \Ja + b. Similarly, ||y|| = ^Jb + a. 
By Cauchy-Schwarz, 2y[ab < y/a + by/b + a = a + b. 

Hence, \fab < — . 


21. 0 23. 2 / >/5 25. l,r,3/ 2 - 1 


27. [M] The new orthogonal polynomials are multiples of 
— lit + 5t 3 and 72 — I55t 2 + 35t 4 . Scale these 
polynomials so their values at —2, —1, 0,1, and 2 are small 
integers. 


Section 6.8, page 389 

1. y = 2 

3. p{t) = Ap Q - Ap\ - .5/?2 + .2^3 

= 4-.U-.5(f 2 -2) + .2(|i 3 -fi) 

(This polynomial happens to fit the data exactly.) 

5. Use the identity 

sinmt sinnt = ^[cos(mt — nt) — cos(mt + nt)] 

. 9 1 + cos 2 k t 

7. Use the identity cos kt = --- . 

9. 7T + 2 sin ^ + sin 2? + | sin 3t [Hint: Save time by using 
the results from Example 4.] 

11. 卜 -cos2r (Why?) 






































Section 7.1 A47 


0 V5/3 

\/V5 -4/V45 
-2/V5 -2/sf^5 


equations is Ax = b, and the set of all least-squares 
solutions coincides with the set of solutions of 
A t Ax = A T b (Theorem 13 in Section 6.5). Study this 
equation, and use the fact that (vv r )x = y(y 7 x) = (v r x)y, 
because y r x is a scalar. 

13. a. The row-column calculation of Au shows that each row 
of A is orthogonal to every u in Nul A. So each row of 
A is in (Nul A)^~. Since (Nul 乂）丄 is a subspace, it must 
contain all linear combinations of the rows of A; hence 
(Nul A) 1 - contains Row A. 

b. If rank A = r, then dim Nul A = n — r,by the Rank 
Theorem. By Exercise 24(c) in Section 6.3, 

dim Nul A + dim(Nul A) 1 ' = n 

So dim (Nul A) 1 - must be r. But Row A is an 
r-dimensional subspace of (Nul ^4)-*-, by the Rank 
Theorem and part (a). Therefore, Row ^4 must coincide 
with (Nul A) 1 -. 

c. Replace A by A T in part (b) and conclude that Row A T 
coincides with (Nul A T )^~ . Since Row A 7 = Col A, this 
proves (c). 

If ^4 = URU T with U orthogonal, then A is similar to R 
(because U is invertible and U T = U~ l ) and so A has the 
same eigenvalues as R (by Theorem 4 in Section 5.2), 
namely, the n real numbers on the diagonal of R. 

II Ax|| 


15. 


17. [M] 


l|x|| 

cond ⑷ x 


.4618, 

IIAbll ^ 
l|b|| 


3363 x (1.548 x 10— 4 ) = .5206. 


Observe that || Ax||/||x|| almost equals cond(yl) times 
l|Ab||/||b||. 


19. [M] 


II Ax|| 


7.178x 10- 8 , " Ab " 


2.832 x 10- 


l|x|| _ l|b|| 

Observe that the relative change in x is much smaller than 
the relative change in b. In fact, since 

l|Ab|| 


cond ⑷ x 


l|b|| 


23,683 x (2.832 x 10~ 4 ) = 6.707 


the theoretical bound on the relative change in x is 6.707 (to 
four significant figures). This exercise shows that even 
when a condition number is large, the relative error in a 
solution need not be as large as you might expect. 


Chapter 7 

Section 7.1, page 399 

1. Symmetric 3. Not symmetric 


5. Not symmetric 


7. Orthogonal, 


.8 

-.6 


9. Not orthogonal 


13. Hint: Take functions / and g in C[0, 2jr], and fix an 

integer m > 0. Write the Fourier coefficient of / + g that 
involves cos mt, and write the Fourier coefficient that 
involves sin mt(m > 0). 

15. [M] The cubic curve is the graph of 

g(r) = —.2685 + 3.6095r + 5.8576r 2 - Milt 3 . The 
velocity ait = 4.5 seconds is g’(4.5) = 53.4 ft/sec. This is 
about .7% faster than the estimate obtained in Exercise 13 
in Section 6.6. 
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a. 

F 

b. 

T 

c. 

T 

d. 

F 

e. 

F 

f. 

T 

g. 

T 

h. 

T 

i. 

F 

j- 

T 

k. 

T 

1. 

F 

m. 

s. 

T 

F 

n. 

F 

0 . 

F 

P. 

T 

q- 

T 

r. 

F 


2. Hint: If {vi, V 2 } is an orthonormal set and x = CiVi + C 2 V 2 , 
then the vectors CiVi and C 2 V 2 are orthogonal, and 

W 2 = || CiVi + C2V2 || 2 = || CiVi || 2 + || C2V2 || 2 

= (ki|||v 1 ||) 2 + (|c 2 |||v 2 ||) 2 = | Cl p + | C2 | 2 

(Explain why.) So the stated equality holds for p = 2. 
Suppose that the equality holds for p = k, with k > 2, let 
{vi ， ... ， Vyt+i} be an orthonormal set, and consider 

X = CiVi +- h C k \k + C k+ i\k+l =U k -\- Ca ； +iVa ： +1, 

where = CiVi H - h c^k- 

3. Given x and an orthonormal set {vi,..., v；,} in R n , let x be 
the orthogonal projection of x onto the subspace spanned by 
Vi,..., \ p . By Theorem 10 in Section 6.3, 

X = (x-Vi)vi H - f- (x.y / ,)y / , 

By Exercise 2, ||x|| 2 = |x-Vi| 2 + ••• + Ix.vJ 2 . Bessel’s 
inequality follows from the fact that ||x|| 2 < ||x| 卩 ， noted 
before the statement of the Cauchy-Schwarz inequality, in 
Section 6.7. 

5. Suppose (Ux)- (Uy) = x-y for all x, y in R n , and let 
ei ， … ， e„ be the standard basis for . For 
j = l,... ,n,Uej is the jth column of U. Since 
\\Uej || 2 = (Uej). (JJtj) = e 7 -e 7 - = 1, the columns of U 
are unit vectors; since (U e》• (U e 灸 ) =ej • ek =0 for 
j ♦ k, the columns are pairwise orthogonal. 

7. Hint: Compute Q T Q, using the fact that 
(uu r ) r = u rr u r = uu r . 

9. Let W = Span {u, y}. Given z in R n , let z = proj^ z. Then 
z is in Col A, where A = [u \ ], say, z = Ax for some x 
in R 2 . So x is a least-squares solution of Ax = z. The 
normal equations can be solved to produce x, and then z is 
found by computing Ax. 



X 


a 


1 

11. Hint: Let x = 

y 

, b = 

b 

,V = 

-2 




c 


5 


,and 


.The given set of 


3 3 3 
/ / / 

2 2 1 


al, 

n 

go 

ho 

1-th 

o 

11 . 
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+ 6 


1/3 

1/3 

1/3 

1/3 

1/3 

1/3 

1/3 

1/3 

1/3 


35. Hint: (uu r )x = u(u r x) = (u r x)u, because u r x is a scalar. 


Section 7.2, page 406 
1. a. 5x\ + |xix 2 + x\ 


b. 185 


c. 16 


3. a. 


5. a. 


10 -3 
-3 -3_ 

8-3 2' 

-3 7 -1 

2 -1 -3 


7. x = Py, where P 




y T Dy = 6yf - 4y. 


In Exercises 9-14, other answers (change of variables and new 
quadratic form) are possible. 


7 ! 


Ti 


T^L-3 


3 


9. Positive definite; eigenvalues are 7 and 2 
Change of variable: x = Py, with P 
New quadratic form: ly\ + 2y\ 

11. Indefinite; eigenvalues are 7 and —3 
Change of variable: x = Py, with P 
New quadratic form: ly\ — 

13. Positive semidefinite; eigenvalues are 10 and 0 

1 r 

Change of variable: x = Py, with P 

New quadratic form: I0y^ 

15. [M] Negative semidefinite; eigenvalues are 0, —6, —8, —12 
Change of variable: x = Py; 


P 


New quadratic form: —6yl — 8j| — \2y\ 

17. [M] Indefinite; eigenvalues are 8.5 and —6.5 
Change of variable: x = Py; 


P 


V50 



0 

2 

3 

b. 

2 

0 

-4 


3 

-4 

0 


33. A = 8uiu ； + 6u2uJ + 3u3U 『 
1/2 - 1/2 0 


- 1/2 1/2 


0 


0 



5 3/2 

3/2 


0 


25. See the Study Guide. 

27. (B t AB) t = B t A t B tt Product of transposes in 

reverse order 

= B T AB Because A is symmetric 

The result about 5 M is a special case when A = I. 
(BB T ) T = B TT B T = BB T , so BB T is symmetric. 

29. Hint: Use an orthogonal diagonalization of A, or appeal to 
Theorem 2. 

31. The Diagonalization Theorem in Section 5.3 says that the 
columns of P are (linearly independent) eigenvectors 
corresponding to the eigenvalues of A listed on the diagonal 
of D. So P has exactly k columns of eigenvectors 
corresponding to A. These k columns form a basis for the 
eigenspace. 




3-434 
5 0-50 

4 3 4 -3 
0 5 0 5 


6 6 6 

/ / / 



6 6 6 

/ / / 



6 6 6 
/ / / 

112 


D 


7 




V2OW 


1/4/V6AV6 
1 / 1 / 1 / 2 / 1 / 

-^ - 
4/1/ 


V2v^ 

1 / 1 / 


v^v^vs 
/ / / 

11 1A 11 


3 3 3 
/ / / 

2 12 

I - 

V45V45V45 
4/2/5 /『 

v^svso 
1 / 2 / 


V20V2 


V2V2 


0 


5 5 5 5 
7 0 I. . I. . 

00 .5.5.5.5 


V6V6V/6 
/ / / 

112 

-I 

V2f_ 

1 / 1 / 

I 

V3V3^ 
/ / / 

1A 11 1± 


尸 

13 . 


尸 

15 . 


P 

17 . 


D 


尸 

19 . 


D 


尸 

21 . 


D 


P 

3 . 

2 


D 


V2^ 

1 / 1 / 


2 

/ 


/ / / 

11 11 11 


V6V6V6 
/ / / 

2 11 
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Section 7.3, page 413 


Py, where P 


1/3 

2/3 

-2/3 


5. a. 7 


b •士 


-1/V2 

1/V2 


7 •士 


1/3' 

2/3 

2/3 


9. 5+ 


11. 3 


13. Hint: If m = M, take a = 0 in the formula for x. That is, 
let x = u„, and verify that x T Ax = m.lf m < M and if t is 
a number between m and M, then 0<t — m<M—m and 
0 < {t — m)/(M — m) < 1. So let of = (t — m)/(M — m). 
Solve the expression for a to see that t = (l — a)m + aM. 
As a goes from 0 to 1, ? goes from m to M. Construct x as 
in the statement of the exercise, and verify its properties. 


15. [M] a. 7.5 b. 


-.5 


17. [M] a. -4 


b. 


- 3/712 

i/Vn 

1 /VT 2 

1 /V 12 


-10 


17. 


19. 


21 . 


23. 


( 义一 Ai)(A — 久 0 = A — (Ai + 久 0 义 + 久 1 又 2 9 

Equate coefficients to obtain X 2 = a d and 
X 1 X 2 = ad — b 2 = det A. 

25. Exercise 27 in Section 7.1 showed that B T B is symmetric. 

Also, x t B t Bx = (B\) t Bx = ||5x|| 2 > 0, so the quadratic 
form is positive semidefinite, and we say that the matrix n # 

B T B is positive semidefinite. Hint: To show that B T B is 
positive definite when B is square and invertible, suppose 
that x T B T Bx = 0 and deduce that x = 0. 

27. Hint: Show that A B is symmetric and the quadratic 

form x T (A + B)x is positive definite. 13 . 


New quadratic form: 8.5jf + 8.5^| — 6.5j| — 6.5y$ 
19. 8 21. See the Study Guide. 

23. Write the characteristic polynomial in two ways: 


det(A — XI) = det 


a 一 A 
b 


d-X 


• X~ — (a + d)A + ad — 


and 


7. 


1/V5 -2/V5 

2/V5 1/V5 

1 /V 2 - 1 /V 2 0 

0 0 1 
1 /V 2 1 /V 2 0 


3V10 

0 

0 

V^o 

0 

0 


3/7To -1/^ 

1/VlO 3/VlO 


3 2 

2 3 


'l/V2 

-i/w_ 

"5 0 0" 

1 /V 2 

1 /V 2 

_0 3 0_ 


1/V2 

-1/VT8 
-2/3 

a. rank^4 = 2 

b. Basis for Col A: 

Basis for Nul A: 


1 /V 2 0 

1/Vl8 -4/^18 
1/3 J 


2/3 

.40 「 

.37 

-.84 

.58 

-.58 

.58 


-.78 

-.33 

-.52 


(Remember that V T appears in the SVD.) 

Let A = U'EV 7 = WEV~ l . Since A is square and 
invertible, rank^4 = n, and all the entries on the diagonal of 
E must be nonzero. So A~ l = (JJTjV~ x )~ x = VZ~ l U~ l 
=V^~ l U T . 

Hint: Since U and V are orthogonal, 

A t A = (U'EV t ) t U'EV t = VY l T U T UY,V T 
= y(E r S)K- 1 

Thus V diagonalizes A T A. What does this tell you about VI 

Let A = U'EV 7 . The matrix PU is orthogonal, because P 
and U are both orthogonal. (See Exercise 29 in 
Section 6.2.) So the equation PA = (PU)TiV T has the form 
required for a singular value decomposition. By Exercise 
19, the diagonal entries in E are the singular values of PA. 

Hint: Use a column-row expansion of {UY)V T . 


r 3 0 " 

2 /V 5 

1/V5 

|_0 2_ 

-1/V5 

2/V5 


Section 7.4, page 423 
1. 3, 1 3. 3,2 

The answers in Exercises 5-13 are not the only possibilities. 

5. 


"-3 

O' 


'-1 

0" 

"3 

0" 

"1 

0" 

0 

0 


0 

1 

0 

0 

0 

1 


2/75 l/v/5 

-1/V5 2/V5 


■-1/3 

2/3 

2/3" 

~3^ 

0~ 

2/3 

-1/3 

2/3 

0 

0 

2/3 

2/3 

-1/3 

0 

0 


3 3 3 
/ / / 


15 


3 3 3 
/ / / 


士 

b. 


a. 
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s = = [元…文]: 

N n 

= I^T EK = T^T E(X,- M)(X,-M) r 
1 1 

Chapter 7 Supplementary Exercises, page 432 


3. If rank A = r, then dim Nul A = n — r,by the Rank 
Theorem. So 0 is an eigenvalue of multiplicity n — r. 
Hence, of the n terms in the spectral decomposition of A, 
exactly n — r are zero. The remaining r terms 
(corresponding to the nonzero eigenvalues) are all rank 1 
matrices, as mentioned in the discussion of the spectral 
decomposition. 

5. If A\ = Ay for some nonzero A, then 

y = A - My = ^4(A — W), which shows that y is a linear 
combination of the columns of A. 

7. Hint: If A = R T R, where R is invertible, then A is positive 
definite, by Exercise 25 in Section 7.2. Conversely, 
suppose that A is positive definite. Then by Exercise 26 in 
Section 7.2, A = B T B for some positive definite matrix B. 
Explain why B admits a QR factorization, and use it to 
create the Cholesky factorization of A. 

9. If A is m x n and x is in V, then x T A T Ax = (Ax) T (Ax)= 
||ylx|| 2 > 0. Thus A T A is positive semidefinite. By 
Exercise 22 in Section 6.5, rank A T A = rank A. 

11. Hint: Write an SVD of A in the form A = UY,V T = PQ, 
where P = U'LU 7 and Q = UV T . Show that P is 
symmetric and has the same eigenvalues as S. Explain why 
Q is an orthogonal matrix. 

13. a. If b = Ax, then x + = A + b = Ax. By 

Exercise 12(a), x+ is the orthogonal projection of x 
onto Row A. 

b. From (a) and then Exercise 12(c), 

Ax~^~ — A(A + Ax) = (AA^~ A)x = Ax = b. 

c. Since x+ is the orthogonal projection onto Row ^4, the 
Pythagorean Theorem shows that 

||u|| 2 = ||x+|| 2 + ||u — x+|| 2 . Part (c) follows 
immediately. 



"-2 

-14 

13 

13" 


.7" 

1 

-2 

-14 

13 

13 


.7 

• [M] = — • 

-2 

6 

-7 

-7 

,x = 

-.8 

40 

2 

—6 

7 

7 


.8 


4 

-12 

-6 

-6 


.6 


The reduced echelon form of T is the same as the 

x 

reduced echelon form of A, except for an extra row of 


25. Hint: Consider the SVD for the standard matrix of r — say, 
A = UYV T = U'EV- 1 . Let B = {vi,... ,v„} and 
C = {ui,..., u m } be bases constructed from the columns of 
V and U, respectively. Compute the matrix for T relative 
to B and C, as in Section 5.4. To do this, you must show 
that V~ 1 \j = e 7 -, the y th column of I n . 


27. [M] 


-.57 - .65 - .42 .27 

.63 一 .24 — .68 -.29 
.07 一 .63 .53 -.56 

-.51 .34 -.29 -.73 


16.46 


0 

0 

0 

0 

0 

12.16 

0 

0 

0 

0 


0 

4.87 

0 

0 

0 


0 

0 

4.31 

0 

-.10 

.61 

-.21 

-.52 

.55" 


-.39 

.29 

.84 

-.14 

-.19 


-.74 

-.27 

-.07 

•38 

.49 


.41 

-.50 

.45 

-.23 

.58 


-.36 

-.48 

-.19 

-.72 

-.29 



29. [M] 25.9343, 16.7554, 11.2917, 1.0785, .00037793; 

Ji / 口 5 = 68,622 


Section 7 . 5 , page 430 

1. M 


"12' 

\B = 

'7 

10 

—6 

-9 

-10 

8' 

_ 10_ 

_2 

-4 

-1 

5 

3 

-5_ 


S 


3. 


.95 

-.32 


86 -27 
-27 16 


for A = 95.2, 


.32 

.95 


for A = 6.8 


5. [M] (.130, .874, .468), 75.9% of the variance 
7. = .95xi — .32^2 ； y\ explains 93.3% of the variance. 

9. c\ = 1/3, C 2 = 2/3, C 3 = 2/3; the variance of 少 is 9. 

11. a. If w is the vector in R# with a 1 in each position, then 

[Xi … Xa^wsX! + … + X# = 0 
because the X& are in mean-deviation form. Then 
[Yi … Y n ] w 

=[P r Xi … P t Xn ] w By definition 

= P T [X { … X^]w = P r 0 = 0 

That is, Yi + ... + Y# = 0, so the are in 
mean-deviation form. 

b. Hint: Because the Xj are in mean-deviation form, the 
covariance matrix of the X / is 


1/(W-1)[X, ••• XivJfXi 


Xjv ] 7 


Compute the covariance matrix of the Y), using part (a). 
13. If 5 = [X! … Xjv], then 


F F 

f- L 
F F F 
e.k.q- 
F F T 
d..J-p- 
TFT 
c..1.0. 
F T F 
b.h.n. 
TFT 
a.g-m. 
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zeros. So adding scalar multiples of the rows of A to x T can 
produce the zero vector, which shows that x T is in Row A. 



0 

0 


Basis for Nul A: 


0 , 1 
0 1 

0 0 


Chapter 8 


Section 8.1, page 442 


25. To show that D C E D F, show that D C E and D C F. 
The complete proof is presented in the Study Guide. 


Section 8.2, page 452 

1. Affinely dependent and 2\\ + ¥2 — 3 v 3 = 0 

3. The set is affinely independent. If the points are called Vi, 
¥ 2 , V 3 , and V 4 , then {vi,V 2 , V 3 } is a basis for R 3 and 
v 4 = 16vi + 5 v2 — 3v 3, but the weights in the linear 
combination do not sum to 1 . 


1. Some possible answers: y = 2\\ — 1 . 5 V 2 + . 5 v 3 , 
y = 2vi — 2 v 3 + V 4 , y = 2vi + 3v 2 — 7v 3 + 3v 4 


5. —4vi + 5 y 2 — 4v 3 + 3v 4 = 0 

7. The barycentric coordinates are (—2,4,-1). 


3. y = —3vi + 2 v 2 + 2 v 3 . The weights sum to 1, so this is an 
affine sum. 

5. a. p L = 3bi — b 2 — b 3 g aff S since the coefficients sum 
to 1 . 

b. p 2 = 2 bi + 0 b 2 + b 3 ^ aff 5 since the coefficients do 
not sum to 1 . 

c. p 3 = — bi + 2 b 2 + 0 b 3 G aff S since the coefficients 
sum to 1 . 

7. a. p L € Span S, but ^ aff 5 

b. p 2 € Span S, and p 2 G aff 5 

c. p 3 纟 Span 5, so p 3 ^ aff 5 

and V 2 = \ . Other answers are possible. 

— L 

11. See the Study Guide. 

13. Span {v 2 — Vi, V 3 — Vi} is a plane if and only if 

{v 2 — v 1 , V 3 — Vi} is linearly independent. Suppose and 
C 3 satisfy C 2(\2 — Vi) + C 3 (v 3 — Vi) = 0. Show that this 
implies 02 = 03 = 0 . 

15. Let S = {x : = b}. To show that S is affine, it suffices 

to show that 5 is a flat, by Theorem 3. Let 
W = {x : Ax = 0}. Then W is b. subspace of R n , by 
Theorem 2 in Section 4.2 (or Theorem 12 in Section 2.8). 
Since 5" = VK + p, where p satisfies Ap = b, by Theorem 
6 in Section 1.5, 5 is a translate of W, and hence 5 is a flat. 

17. A suitable set consists of any three vectors that are not 
collinear and have 5 as their third entry. If 5 is their third 
entry, they lie in the plane z = 5. If the vectors are not 
collinear, their affine hull cannot be a line, so it must be the 
plane. 

19. If p, q € / (5), then there exist r,s e S such that /(r) = p 
and /(s) = q. Given any / G R, we must show that 
z = (1 — Op + fq is in f(S). Now use definitions of p and 
q, and the fact that / is linear. The complete proof is 
presented in the Study Guide. 



9. See the Study Guide. 


11. When a set of five points is translated by subtracting, say, 
the first point, the new set of four points must be linearly 
dependent, by Theorem 8 in Section 1.7, because the four 
points are in R 3 . By Theorem 5, the original set of five 
points is affinely dependent. 


13. If {vi, V 2 } is affinely dependent, then there exist C\ and 
not both zero, such that ci + C 2 = 0 and ciVi + C 2\2 = 0 . 
Show that this implies Vi = \ 2 - For the converse, suppose 
Vi = \2 and select specific C\ and C 2 that show their affine 
dependence. The details are in the Study Guide. 


15. a. 


The vectors y 2 — Vi = 


and y 3 — vi = 



are 


not multiples and hence are linearly independent. By 
Theorem 5, S is affinely independent. 


b * Pi ^ ( _ 養， I ， 暑)， P 2 * (0, I ，臺)， P 3 O (专， 一 i ， _ 套)， 

P 4 ^ ( i ， _ 誉， |)， P 5 ^ 暑） 


c. Pg is (— ， 一 ， +)， p 7 is ( 0 , + ， 一 )，and p 8 is (+ ， + ， 一 ). 


17. Suppose S = {bi ，…， bfc} is an affinely independent set. 
Then equation (7) has a solution, because p is in aff 5. 
Hence equation ( 8 ) has a solution. By Theorem 5, the 
homogeneous forms of the points in S are linearly 
independent. Thus ( 8 ) has a unique solution. Then (7) also 
has a unique solution, because ( 8 ) encodes both equations 
that appear in (7). 


The following argument mimics the proof of Theorem 7 in 
Section 4.4. If 5 = {bi, …， bk} is an affinely independent 
set, then scalars Ci,... ,C/c exist that satisfy (7), by 
definition of aff S. Suppose x also has the representation 


x = d\\i\ + • • • + and d\ + . ■. + t / 灸 =1 (7 a) 


for scalars d\,... ,dk. Then subtraction produces the 
equation 


21. Since B is affine, Theorem 1 implies that B contains all 
affine combinations of points of B. Hence B contains all 
affine combinations of points of A. That is, aff ^4 C 

23. Since A <Z B), it follows from Exercise 22 that 
aff ^4 C aff (y4 U B). Similarly, aff ^ C aff (A U B), so 
[aff ^ U aff 5] C aff(AU 5). 


0 = x — x = (Ci — di)b\ + ••• + (Q — dk)hk (7b) 

The weights in (7b) sum to 0 because the c’s and the d's 
separately sum to 1. This is impossible, unless each weight 
in ( 8 ) is 0, because S is an affinely independent set. This 
proves that c, = di for i = 1 ， ... ， A:. 
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19. If {Pi ， p 2 , P 3 } is an affinely dependent set, then there exist 
scalars Ci, C 2 , and C 3 , not all zero, such that 
cipj + C 2 p 2 + C 3 P 3 = 0 and ci + C 2 + C 3 = 0. Now use 
the linearity of /. 

办 1 


21. Let a : 


a\ 

a 2 


,b ： 


det [ a b c] = det 


hi _ 

a 2 

1 


,and c : 

b\ c 
b 2 c 

1 ] 


.Then 


,by the transpose property of the 


a\ a 2 
det b\ b 2 
_c\ c 2 

determinant (Theorem 5 in Section 3.2). By Exercise 30 in 
Section 3.3, this determinant equals 2 times the area of the 
triangle with vertices at a, b, and c. 


23. If[a b c] 


p, then Cramer’s rule gives 


r = det [ p b c ]/ det [ a b c ]. By Exercise 21, the 
numerator of this quotient is twice the area of Apbc, and 
the denominator is twice the area of Aabc. This proves the 
formula for r. The other formulas are proved using 
Cramer’s rule for s and t. 

Section 8.3, page 459 

1. See the Study Guide. 

3. None are in conv S. 

5. p L = -gVi + \\2 + |v 3 + ^V4, so P! ^ conv S. 

P 2 = |vi + \y 2 + |y 3 + |y 4 , so p 2 g conv 5. 

7. a. The barycentric coordinates of p l9 p 2 , P 3 , and p 4 are, 
respectively, (H ， !) ， (0, H), |), and 

G ， b~\)- 

b. p 3 and p 4 are outside conv T. p L is inside conv T. 
p 2 is on the edge V 2 V 3 of conv T. 

9. p t and p 3 are outside the tetrahedron conv S. p 2 is on the 
face containing the vertices V 2 , V 3 , and V 4 . p 4 is inside 
conv S. p 5 is on the edge between Vi and V 3 . 

11. See the Study Guide. 

13. If p, q € / (5), then there exist r,s e S such that /(r) = p 
and /(s) = q. The goal is to show that the line segment 
y = (1 — ?)P +for 0 < ? < 1, is in f(S). Use the 
linearity of f and the convexity of S to show that 
y = /(w) for some w in S. This will show that y is in f(S) 
and that f(S) is convex. 

15. p = |vi + \\ 2 + |v 4 and p = ^Vi + \\ 2 + |v 3 . 

17. Suppose A C B, where B is convex. Then, since B is 
convex, Theorem 7 implies that B contains all convex 
combinations of points of B. Hence B contains all convex 
combinations of points of A. That is, conv A C B. 

19. a. Use Exercise 18 to show that conv A and conv B are 

both subsets of conv (A U B). This will imply that their 
union is also a subset of conv (A U B). 


21 . 


b. One possibility is to let A be two adjacent comers of a 
square and let B be the other two comers. Then what is 
(conv yl) U (conv B), and what is conv (^4 U B)1 


• P 2 



23. g(0 = (1 — + 

= (1 — 0[(i — Opo + + ^[(i — Opi + 印 2] 

=(l - 0 2 Po + 2r(l — r)p 2 + t 2 p 2 - 
The sum of the weights in the linear combination for g is 
(1 — t ) 2 + 2 t(l — 0 + which equals 
(\ —2t 1 2 ) + (2t — 2t 2 ) +， 2 = 1. The weights are each 

between 0 and 1 when 0 < / < 1, so g{t) is in 
conv{p 0 ,p 1 ,p 2 }. 


Section 8 . 4 , page 467 
1. f{x\, X 2 ) = 3x\ + 4^2 and d = 
3. a. Open b. Closed 

d. Closed e. Closed 
5. a. Not compact, convex 

b. Compact, convex 

c. Not compact, convex 

d. Not compact, not convex 

e. Not compact, convex 

" 0 " 

7. a. n = 2 or a multiple 

_3_ 

b. / (x) = 2x 2 + 3x 3 , d = l\ 


13 


c. Neither 


9. a. n 


2 


or a multiple 


b. /(x) = 3x\ — X 2 + 2^3 + X 4 , d =5 

11 . \2 is on the same side as 0 , Vi is on the other side, and V 3 is 
in H. 


13. One possibility is p : 



32" 


" 10 " 


-14 


-7 

= 

0 

,Vi = 

1 


0 


0 


v 2 : 


15. f(x u x 2 ,x 3 ) ■ 
17. f(xi,x 2 ,x 3 ) : 
19. f(x u x 2 ,x 3 )' 


X\ — 3^2 + 4^3 — 2 ^ 4 , and d = 5 
X\ — 2 x 2 + 又 3 , and d = 0 
~5xi + 3 x 2 + 又 3 , and d = 0 
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21. See the Study Guide. 

23. f(xi, X 2 ) = 3xi — 2^2 with d satisfying 9 < d < 10 is one 
possibility. 

25. f(x, 少 ） = 4x + 1. A natural choice for d is 12.75, which 
equals /(3, .75). The point (3, .75) is three-fourths of the 
distance between the center of 5(0,3) and the center of 
帅 , 1). 

27. Exercise 2(a) in Section 8.3 gives one possibility. Or let 
S = {(u) : x 2 y 2 = 1 and y > 0}. Then conv S is the 
upper (open) half-plane. 

29. Let x, y G B(p, 8 ) and suppose z = (1 — /)x + ty, where 
0 < f < 1. Then show that 

I|z-Pll = ||[(l- 0 x + ry]-p|| 

=||(l-0(x-p) + ^(y-p)|| < 8 . 

Section 8.5, page 479 

1. a. m = 1 at the point pj b. m = 5 at the point p 2 
c. m = 5 at the point p 3 

3. a. m = —3 at the point p 3 

b. m = 1 on the set conv {p 1? p 3 } 

c. m = —3 on the set conv {p 1? p 2 } 


9. The origin is an extreme point, but it is not a vertex. 
Explain why. 



11. One possibility is to let 5 be a square that includes part of 
the boundary but not all of it. For example, include just two 
adjacent edges. The convex hull of the profile P is a 
triangular region. 



13. a. / 0 (C 5 ) = 32, MC 5 ) = 80, / 2 (C 5 ) = 80, 
/ 3 (C 5 ) = 40,/ 4 (C 5 ) = 10, and 
32 - 80 + 80 - 40 + 10 = 2. 


b. 



/o 

/1 

h 

h 

u 

5 1 

2 





s 1 

4 

4 




5 3 

8 

12 

6 



5 4 

16 

32 

24 

8 


5 5 

32 

80 

80 

40 

10 


For a general formula, see the Study Guide. 

15. a. f 0 (P n ) = MQ) + 1 

b. f k (P n ) = MQ) + fk-i(Q) 

C. fn-l(P") = fn-l(Q) + I 

17. See the Study Guide. 

19. Let S be convex and let x e cS dS, where c > 0 and 
d > 0. Then there exist Si and S 2 in S such that 
x = csi + ds 2 . But then 

x = csi + ds 2 = (c d) C :? 1 + c 二？ 2 ) • 

Now show that the expression on the right side is a member 
of (c + d)S. 

For the converse, pick a typical point in (c + d)S and 
show it is in cS + dS• 

21. Hint: Suppose A and B are convex. Let x,y e A B. 
Then there exist a, c G ^4 and b, d G 5 such that x = a + b 
and y = c + d. For any t such that 0 < ? < 1, show that 

w = (1 —，)x + ry = (1 — 0(a + b) + r(c + d) 

represents a point in ^4 + B. 

Section 8.6, page 490 

1. The control points for x(?) + b should be p 0 + b, p L + b, 
and p 3 + b. Write the Bezier curve through these points, 
and show algebraically that this curve is x(f) + b. See the 
Study Guide. 

3. a. x’(/) = ( _ 3 + 6/ _ 3t2)Po + (3 _ \2.t -h 9?~)pj -h 
(6r — 9r 2 )p 2 + 3/ 2 p 3 , so 
x’(0) = -3p 0 + 3p! = 3(Pi - p 0 ), and 
x’ （ l) = — 3p 2 + 3p 3 = 3(p 3 — p 2 ). This shows that the 
tangent vector x’(0) points in the direction from p 0 to 
and is three times the length of P! — p 0 . Likewise, x’ （ l) 
points in the direction from p 2 to p 3 and is three times 
the length of p 3 — p 2 _ In particular, x’ （ l) = 0 if and 
only if p 3 = p 2 . 

b. x r/ (r) = (6 — 60po + ( — 12 + 18r)p! 

+ (6 — 18r)p 2 + 6rp 3 , so that 

X" ⑼ = 6p 0 - 12p! + 6p 2 = 6(p 0 — pD + 6(p 2 - pD 
and 

x"(l) = 6pi — 12p 2 + 6p 3 = 6(p! - p 2 ) + 6(p 3 - p 2 ) 
For a picture of x"(0), construct a coordinate system 
with the origin at p l9 temporarily, label p 0 as Po — Pi» 
and label p 2 as P 2 — Pi- Finally, construct a line from 


0 


5 


4 


0 

0 


0 

' 

3 

' 

5 


0 


7 


6 


0 

0 

■ 

0 


4 
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this new origin through the sum of p 0 — Pj and p 2 — p l5 
extended out a bit. That line points in the direction of 
x"(0). 


0 = Pi P2P1 



5. a. From Exercise 3(a) or equation (9) in the text, 
x’(l) = 3(p 3 -p 2 ) 


1 - 4r + 6r 2 - 4f 3 + 1 4 
4t - \2t 2 + \2t 3 - 4? 4 
6t 2 - I2t 3 + 6t A 
4r 3 -4f 4 


1-4 6-4 ll [ 1 

0 4 -12 12 -4 t 

0 0 6 -12 6 t 2 

0 0 0 4 -4 t 3 

0 0 0 0 1 t 4 


Mu 


-4 6 -4 

4 -12 12 

0 6 -12 

0 0 4 

0 0 0 


1 

-4 

6 

-4 

1 


Use the formula for x’(0), with the control points from 
y(^), and obtain 

y’ ⑼ = 3p 3 + 3p 4 = 3(p 4 - p 3 ) 

For C 1 continuity, 3(p 3 — p 2 ) = 3(p 4 — p 3 ), so 
p 3 = (p 4 + p 2 )/ 2 , and p 3 is the midpoint of the line 
segment from p 2 to p 4 . 

b. If x’ （ l) = y’(0) = 0, then p 2 = p 3 and p 3 = p 4 . Thus, 
the “line segment” from p 2 to p 4 is just the point p 3 . 
[Note: In this case, the combined curve is still C 1 
continuous, by definition. However, some choices of 
the other “control” points, p 0 , p l9 p 5 , and p 6 , can 
produce a curve with a visible corner at p 3 , in which 
case the curve is not G 1 continuous at p 3 .] 

7. Hint: Use x r/ (t) from Exercise 3 and adapt this for the 
second curve to see that 


y r/ (0 = 6(1 — /)p 3 + 6(—2 + 3^)p 4 + 6(1 — 3/)p 5 + 6tp 6 

Then setx 〃（ l) = y"(0). Since the curve is C 1 continuous 
at p 3 , Exercise 5(a) says that the point p 3 is the midpoint of 
the segment from p 2 to p 4 . This implies that 
p 4 — p 3 = p 3 — p 2 . Use this substitution to show that p 4 
and p 5 are uniquely determined by p 1? p 2 , and p 3 . Only p 6 
can be chosen arbitrarily. 


11. See the Study Guide. 


13. a. Hint: Use the fact that q 0 = p 0 . 

b. Multiply the first and last parts of equation (13) by | 
and solve for 8 q 2 . 

c. Use equation ( 8 ) to substitute for 8 q 3 and then apply 
part (a). 


15. a. From equation (11), y’ （ l) = .5x’ （ .5) = z’(0). 

b. Observe that y’ （ l) = 3(q 3 — q 2 ). This follows from 
equation (9), with y(t) and its control points in place of 
x(t) and its control points. Similarly, for z(t) and its 
control points, z’(0) = 3(ri — ro). By part (a), 

3(q 3 — q 2 ) = 3(ri — ro). Replace ro by q 3 , and obtain 
^3 _ ^2 = r i - and hence q 3 = (q 2 + rO/2. 

c. Set q 0 = p 0 and r 3 = p 3 . Compute = (Po + Pi)/2 
and r 2 = (p 2 + p 3 )/2. Compute m = (pj + p 2 )/2. 
Compute q 2 = (q L + m)/2 and ri = (m + r 2 )/2. 
Compute q 3 = (q 2 + 1^)/2 and set r 0 = q 3 . 


17. a. r 0 = Po, ri = 


^,r 2 = 


2pj + P 2 


r 3 = P 2 


b. Hint: Write the standard formula (7) in this section, 
with r, in place of p ; for i = 0,... ， 3, and then replace 
1*0 and r 3 by p 0 and p 2 , respectively: 


x(0 = (1 - 3/ + 3t 2 - r 3 )p 0 

+ (3t - 6t 2 + 3r 3 )ri (Hi) 

+ (3^ 2 — 3f 3 )r2 + , 3 P2 


9. Write a vector of the polynomial weights for x(^), expand 
the polynomial weights, and factor the vector as Msu(t): 


Use the formulas for q and r 2 from part (a) to examine 
the second and third terms in this expression for x(^). 
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Accelerator-multiplier model, 25In 
Adjoint, classical, 179 
Adjugate, 179 
Adobe Illustrator, 481 
Affine combinations, 436-444 
definition of, 436 
of points, 436-439,441-442 
Affine coordinates, 447^451 
Affine dependence, 445,451 
definition of, 444 

linear dependence and, 445-446, 452 
Affine hull (affine span), 437,454 
geometric view of, 441 
of two points, 446 
Affine independence, 444-454 
barycentric coordinates, 447-453 
definition of, 444 
Affine set, 439441， 455 
dimension of, 440 
intersection of, 456 
Affine transformation, 69 
Aircraft design, 91, 117 
Algebraic multiplicity of an eigenvalue, 
276 

Algebraic properties of R” ， 27, 34 
Algorithms 

bases for Col A, Row A, Nul A, 
230-233 

compute a B-matrix, 293 
decouple a system, 306, 315 
diagonalization, 283-285 
finding A-l, 107-108 
finding change-of-coordinates matrix, 
241 

Gram-Schmidt process, 354-360 
inverse power method, 322-324 
Jacobi’s method, 279 
LU factorization, 124-127 
QR algorithm, 279, 280, 324 
reduction to first-order system, 250 
row-column rule for computing AB, 
96 

row reduction, 15-17 
row-vector rule for computing Ax, 38 
singular value decomposition, 
418-419 

solving a linear system, 21 
steady-state vector, 257-258 


writing solution set in parametric 
vector form, 46 
Amps, 82 

Analysis of data, 123 

See also Matrix factorization 
(decomposition) 

Analysis of variance, 362-363 
Angles in R 2 and R 3 , 335 
Anticommutativity, 160 
Approximation, 269 
Area 

approximating, 183 
determinants as, 180-182 
ellipse, 184 

parallelogram, 180-181 
triangle, 185 

Argument of complex number, A6 
Associative law (multiplication), 97, 98 
Associative property (addition), 94 
Astronomy, barycentric coordinates in, 
448n 

Attractor, 304, 313 (fig.), 314 
Augmented matrix, 4 
Auxiliary equation, 248 
Average value, 381 
Axioms 

inner product space, 376 
vector space, 190 

B-coordinate vector, 154, 216-217 
B-coordinates, 216 
B-matrix, 290 
Back-substitution, 19-20 
Backward phase, 17,20, 125 
Balancing chemical equations, 51, 54 
Band matrix, 131 
Barycentric coordinates, 447-451 
in computer graphics, 449-451 
definition of, 447 

physical and geometric interpretations 
of, 448-449 
Basic variable, 18 
Basis, 148-150, 209, 225 
change of, 239-244 
change of, in 241-242 
column space, 149-150, 211-212, 
231-232 

coordinate systems, 216-222 


eigenspace, 268 
eigenvectors, 282, 285 
fundamental set of solutions, 312 
fundamental subspaces, 420-421 
null space, 211-212, 231-232 
orthogonal, 338-339, 354-356, 
377-378 

orthonormal, 342, 356-358, 397,416 
row space, 231-233 
solution space, 249 
spanning set, 210 
standard, 148, 209, 217, 342 
subspace, 148-150 
two views, 212-213 
Basis matrix, 485n 
Basis Theorem, 156, 227 
Beam model, 104 
Bessel’s inequality, 390 
Best approximation 
C[a, b\, 386 
Fourier, 387 
P 4 , 378-379 

to y by elements of W, 350 
Best Approximation Theorem, 350 
Bezier basis matrix, 485 
Bezier curves, 460,481-492 
approximations to, 487-488 
in CAD programs, 487 
in computer graphics, 481, 482 
connecting two, 483 一 ^85 
control points in, 481, 482,488-489 
cubic, 460, 481-482, 484, 485, 492 
geometry matrix, 485 
matrix equations for, 485-486 
quadratic, 460,481-482, 492 
recursive subdivision of, 488-490 
tangent vectors and continuity, 483, 
491 

variation-diminishing property of, 488 
Bezier, Pierre, 481 
Bezier surfaces, 486-489 
approximations to, 487-488 
bicubic, 487, 489 
recursive subdivision of, 488-489 
variation-diminishing property of, 489 
Bidiagonal matrix, 131 
Bill of final demands, 132 
Blending polynomials, 485n 


II 
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Block matrix, 117 
diagonal, 120 
multiplication, 118 
upper triangular, 119 
Boundary condition, 252 
Boundary point, 465 
Bounded set, 465 
Branch current, 83 
Branches in network, 52, 82 
B-splines, 484, 485, 490 
uniform, 491 

Budget constraint, 412-413 

C (language), 39, 100 
C[a,b], 196, 380-382, 386 
C'295 

CAD programs, 487 
Cambridge Diet, 80-81, 86 
Caratheodory, Constantin, 457 
Caratheodory Theorem, 457-458 
Casorati matrix, 245-246 
Cauchy-Schwarz inequality, 379-380 
Cayley-Hamilton Theorem, 326 
Center of gravity (mass), 33 
Center of projection, 142 
CFD. See Computational fluid dynamics 
Change of basis, 239-244 
in R n , 241-242 
Change of variable 

for complex eigenvalue, 299 
in differential equation, 315 
in dynamical system, 306-307 
in principal component analysis, 427 
in a quadratic form, 402-403 
Change-of-coordinates matrix, 219, 
240-241 

Characteristic equation of matrix, 
273-281,295 

Characteristic polynomial, 276, 279 
Characterization of Linearly Dependent 
Sets Theorem, 58, 60 
Chemical equations, 51, 54 
Cholesky factorization, 406,432 
Classical adjoint, 179 
Classification of States and Periodicity, 
10.4 

Closed set, 465,466 
Codomain, 63 
Coefficient 

correlation, 336 
filter, 246 
Fourier, 387 
of linear equation, 2 
matrix, 4 
regression, 369 
trend, 386 


Cofactor expansion, 165-166, 172 
Column space, 201-203 

basis for, 149-150, 211-212, 231-232 
dimension of, 228, 233 
least-squares problem, 360-362 
and null space, 202-204 
subspace, 147-148, 201 
See also Fundamental subspaces 
Column-row expansion, 119 
Column(s) 

augmented, 108 
determinants, 172 
operations, 172 
orthogonal, 364 
orthonormal, 343-344 
pivot, 14, 212, 233, A1 
span R m , 37 
sum, 134 
vector, 24 

Comet, orbit of, 374 
Communication Classes, 10.3 
Commutativity, 98, 160 
Compact set, 465, 467 
Companion matrix, 327 
Complement, orthogonal, 334-335 
Complex eigenvalues, 315-317 
Complex number, A3-A7 
absolute value of, A4 
argument of, A6 
conjugate, A4 

geometric interpretation of, A5-A6 
polar coordinates, A6 
powers of, A7 
and E 2 , A7 

real and imaginary axes, A5 
real and imaginary parts, A3 
Complex root, 248, 277, 295 
See also Auxiliary equation; 
Eigenvalue 
Complex vector, 24n 

real and imaginary parts, 297-298 
Complex vector space, 190n, 295, 308 
Component of y orthogonal to u, 340 
Composition of linear transformations, 
95,128 

Composition of mappings, 94, 140 
Computational fluid dynamics (CFD), 91 
Computer graphics, 138 

barycentric coordinates in, 449-451 
Bezier curves in, 481, 482 
center of projection, 142 
composite transformations, 140 
homogeneous coordinates, 139, 
141-142 

perspective projections, 142-144 
shear transformations, 139 


3D, 140-142 

Condition number, 114, 116, 176, 391 
singular value decomposition, 420 
Conformable partition, 118 
Conjugate pair, 298, A4 
Consistent system, 4, 7-8, 21 
matrix equation, 36 
Constant of adjustment, positive, 251 
Constrained optimization, 408-414 
eigenvalues, 409-410,411-412 
feasible set, 412 
indifference curve, 412-413 
See also Quadratic form 
Consumption matrix, 133, 134, 135 
Continuity of quadratic/cubic Bezier 
curves 

geometric (G°, G 1 ) continuity, 483 
parametric (C°, C 1 , C 2 ) continuity, 
483,484 

Continuous dynamical systems, 266, 
311-319 

Continuous functions, 196, 205, 230, 
380-382, 387-388 
Contraction transformation, 66, 74 
Contrast between Nul A and Col A, 
202-203 

Control points, in Bezier curves, 460, 
481,482, 488-489 

Control system, 122, 189-190, 264, 301 
control sequence, 264 
controllable pair, 264 
Schur complement, 122 
space shuttle, 189-190 
state vector, 122, 254, 264 
state-space model, 264 
steady-state response, 301 
system matrix, 122 
transfer function, 122 
Controllability matrix, 264 
Convergence, 135, 258-259 
See also Iterative methods 
Convex combinations, 454-461 
convex sets, 455-459, 466-467, 
470473 
definition of, 454 
weights in, 454—455 
Convex hull, 454, 472 

of Bezier curve control points, 

488 (fig.) 

of closed set, 465, 466 
of compact set, 465, 467 
geometric characterization of, 
456-457 
of open set, 465 
Convex set(s), 455-460 
disjoint closed, 466 (fig.) 


Index 13 


extreme point of, 470-473 
hyperplane separating, 466-467 
intersection of, 456 
profile of, 470,472 
See also Polytope(s) 

Coordinate mapping, 216-217, 219-222, 
239 

Coordinate system(s), 153-155, 216-222 
change of basis, 239-244 
graphical, 217-218 
isomorphism, 220-222 
polar, A6 
R n , 218-219 

Coordinate vector, 154, 216-217 
Correlation coefficient, 336 
Cost vector, 31 
Counterexample, 61 
Covariance 

matrix, 425-427, 429 
Cramer’s rule, 177-180 
Cross product, 464 
Cross-product term, 401, 403 
Crystallography, 217-218 
Cube, 435, 436 

four-dimensional, 435 
Cubic curve 

Bezier, 460, 481-482, 484, 485, 
491-492 
Hermite, 485 

Cubic splines, natural, 481 
Current flow, 82 
Current law, 83 

Curve-fitting, 23, 371-372, 378-379 
Curves. See Bezier curves 

De Moivre’s Theorem, A7 
Decomposition 

eigenvector, 302, 319 
force, 342 

orthogonal, 339-340, 348 
polar, 432 

singular value, 414-424 
See also Factorization 
Decoupled system, 306, 312, 315 
Degenerate line, 69, 439 
Design matrix, 368 
Determinant, 163-187, 274-275 
adjugate, 179 
area and volume, 180-182 
Casoratian, 245 

characteristic equation, 276-277 
cofactor expansion, 165-166, 172 
column operations, 172 
Cramer’s rule, 177-180 
echelon form, 171 
eigenvalues, 276, 280 


elementary matrix, 173-174 
geometric interpretation, 180, 275 
(fig.) 

and inverse, 103, 171, 179-180 
linearity property, 173, 187 
multiplicative property, 173, 277 
n x n matrix, 165 
product of pivots, 171, 274 
properties of, 275 
recursive definition, 165 
row operations, 169-170, 174 
symbolic, 464 
3x3 matrix, 164 
transformations, 182-184 
triangular matrix, 167, 275 
volume, 180-182, 275 
See also Matrix 
Diagonal entries, 92 
Diagonal matrix, 92, 120, 281-288, 
417-418 

Diagonal Matrix Representation 
Theorem, 291 
Diagonalizable matrix, 282 
distinct eigenvalues, 284-285 
nondistinct eigenvalues, 285-286 
orthogonally, 396 
Diagonalization Theorem, 282 
Difference equation, 80, 84-85, 244—253 
dimension of solution space, 249 
eigenvectors, 271, 279, 301 
first-order, 250 
homogeneous, 246, 247-248 
linear, 246-249 

nonhomogeneous, 246, 249-250 
population model, 84-85 
recurrence relation, 84, 246, 248 
reduction to first order, 250 
signal processing, 246 
solution sets of, 247, 248-249, 250 
(fig.) 

stage-matrix model, 265-266 
state-space model, 264 
See also Dynamical system; Markov 
chain 

Differential equation, 204-205, 311-319 
circuit problem, 312-313, 316-317, 
318 

decoupled system, 312, 315 
eigenfunctions, 312 
initial value problem, 312 
solutions of, 312 
See also Laplace transform 
Differentiation, 205 
Digital signal processing. 

See Signal processing 
Dilation transformation, 66, 71 


Dimension of a flat (or a set), 440 
Dimension (vector space), 153-160 ， 
225-228 

classification of subspaces, 226-227 
column space, 155, 228 
null space, 155, 228 
row space, 233-234 
subspace, 155-156 
Directed line segment, 25 
Direction 

of greatest attraction, 304, 314 
of greatest repulsion, 304, 314 
Discrete dynamical systems, 301-311 
Discrete linear dynamical system. 

See Dynamical system 
Discrete-time signal. See Signals 
Disjoint closed convex sets, 466 (fig.) 
Distance 

between vector and subspace, 
340-341,351 
between vectors, 332-333 
Distortion, 163 
Distributive laws, 97, 98 
Dodecahedron, 435, 436 
Domain, 63 
Dot product, 330 
Duality, 9.4 

Dynamical system, 265-266 
attractor, 304, 314 
change of variable, 306-307 
decoupling, 312, 315 
discrete, 301-311 
eigenvalues and eigenvectors, 
266-273, 278-279,301-311 
evolution of, 301 
graphical solutions, 303-305 
owl population model, 265-266, 
307-309 

predator-prey model, 302-303 
repeller, 304, 314 
saddle point, 304, 305 (fig.), 314 
spiral point, 317 
stage-matrix model, 265-266, 
307-309 

See also Difference equation; 
Mathematical model 

Earth Satellite Corporation, 394 
Eccentricity of orbit, 374 
Echelon form, 12, 13 

basis for row space, 231-233 
consistent system, 21 
determinant, 171,274 
flops, 20 

LU factorization, 124-126 
pivot positions, 14-15 
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Edges of polyhedron, 470 
Effective rank, 157, 236,417 
Eigenfunctions, 312, 315-316 
Eigenspace, 268-269 
dimension of, 285, 397 
orthogonal basis for, 397 
Eigenvalue, 266-273 

characteristic equation, 276-277, 295 
complex, 277, 295-301, 307, 315-317 
constrained optimization, 408 
determinants, 274-275, 280 
diagonalization, 281-288, 395-397 
differential equations, 312-314 
distinct, 284-285 
dynamical systems, 278-279, 301 
invariant plane, 300 
Invertible Matrix Theorem, 275 
iterative estimates, 277, 319-325 
multiplicity of, 276 
nondistinct, 285-286 
and quadratic forms, 405 
and rotation, 295, 297, 299-300, 

308 (fig.), 317 (fig.) 
row operations, 267, 277 
similarity, 277 
strictly dominant, 319 
triangular matrix, 269 
See also Dynamical system 
Eigenvector, 266-273 
basis, 282, 285 
complex, 295, 299 
decomposition, 302, 319 
diagonalization, 281-288, 395-397 
difference equations, 271 
dynamical system, 278-279, 301-311, 
312-314 

linear transformations and, 288-295 
linearly independent, 270, 282 
Markov chain, 279 
principal components, 427 
row operations, 267 
Electrical network model, 2, 82-83 
circuit problem, 312, 316-317, 318 
matrix factorization, 127-129 
minimal realization, 129 
Elementary matrix, 106-107 
determinant, 173-174 
interchange, 173 
reflector, 390 
row replacement, 173 
scale, 173 

Elementary reflector, 390 
Elementary row operation, 6, 106, 107 
Elements (Plato), 435 
Ellipse, 404 
area, 184 


singular values, 415-416 
Equal matrices, 93 
Equation 

auxiliary, 248 
characteristic, 276-277 
difference, 80, 84-85, 244-253 
differential, 204-205, 311-319 
ill-conditioned, 364 
of a line, 45, 69 
linear, 2-12, 45, 368-369 
normal, 329, 361-362, 364 
parametric, 44-46 
price, 137 
production, 133 
three-moment, 252 
vector, 24-34, 48 
Equilibrium prices, 49-51, 54 
Equilibrium, unstable, 310 
Equilibrium vector, 257-260 
Equivalence relation, 293 
Equivalent linear systems, 3 
Euler, Leonard, 479 
Existence and Uniqueness Theorem, 
21,43 

Existence of solution, 64, 73 
Existence questions, 7-9, 20-21, 36-37, 
64, 72, 113 

Explicit description, 44, 148, 200-201, 
203 

Extreme point, 470-473 


Faces of polyhedron, 470 
Facet of poly tope, 470 
Factorization 

analysis of a dynamical system, 281 
of block matrices, 120 
complex eigenvalue, 299 
diagonal, 281, 292 
for a dynamical system, 281 
in electrical engineering, 127-129 
See also Matrix factorization 
(decomposition); Singular value 
decomposition (SVD) 

Feasible set, 412 
Feynman, Richard, 163 
Filter coefficients, 246 
Filter, linear, 246 
low-pass, 247, 367 
moving average, 252 
Final demand vector, 132 
Finite set, 226 

Finite-dimensional vector space, 226 
subspace, 227-228 
First principal component, 393, 427 
First-order difference equation. 

See Difference equation 


Flat in R n , 440 
Flexibility matrix, 104 
Flight control system, 189-190 
Floating point arithmetic, 9 
Floating point operation (flop), 9, 20 
Flow in network, 52-53, 54-55, 82 
Force, decomposition, 342 
Fortran, 39 

Forward phase, 17, 20 
Fourier approximation, 387 
Fourier coefficients, 387 
Fourier series, 387-388 
Free variable, 18, 21, 43, 228 
Full rank, 237 
Function, 63 

continuous, 380-382, 387-388 
eigenfunction, 312 
transfer, 122 
trend, 386 
utility, 412 

The Fundamental Matrix, 10.5 
Fundamental solution set, 249, 312 
Fundamental subspaces, 234 (fig.), 237, 
335 (fig.), 420-421 

Gauss, Carl Friedrich, 12n, 374n 
Gaussian elimination, 12n 
General least-squares problem, 360 
General linear model, 371 
General solution, 18, 249-250 
Geometric continuity, 483 
Geometric descriptions 
ofR 2 , 25-26 
of Span {u, v}, 30-31 
of Span {y}, 30-31 
Geometric point, 25 

Geometry matrix (of a Bezier curve), 485 
Geometry of vector spaces, 435-492 
affine combinations, 436-444 
affine independence, 444-454 
convex combinations, 454-461 
curves and surfaces, 481-492 
hyperplanes, 435, 440, 461-469 
poly topes, 469-481 
Geometry vector, 486 
Givens rotation, 90 
Global Positioning System (GPS), 
329-330 

Gouraud shading, 487 
Gradient, 462 
Gram matrix, 432 
Gram-Schmidt process, 354-360, 
377-378 

in inner product spaces, 377-378 
Legendre polynomials, 383 
in IP 4 , 378, 386 
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in E” ， 355—356 

Gram-Schmidt Process Theorem, 355 
Graphics, computer. 

See Computer graphics 

Heat conduction, 131 
Hermite cubic curve, 485 
Hermite polynomials, 229 
Hidden surfaces, 450 
Hilbert matrix, 116 
Homogeneous coordinates, 139-140, 
141-142 

Homogeneous forms and affine 
independence, 445, 452 
Homogeneous form of y in , 441-442 
Homogeneous system, 43-44 
difference equations, 246 
in economics, 49-51 
subspace, 148, 199 
Hooke’s law, 104 
Householder matrix, 390 
reflection, 161 
Howard, Alan H., 80 
Hull, affine, 437, 454 
geometric view of, 441 
Hyperbola, 404 
Hypercube, 477-479 
construction of, 477-478 
Hyperplane(s), 435, 440, 461469 
definition of, 440 
explicit descriptions of, 462464 
implicit descriptions of, 461-464 
parallel, 462^464 
separating sets of, 465-467 
supporting, 470 

Hyperspectral image processing, 429 

Icosahedron, 435, 436 
Identity matrix, 38, 97, 106 
Ill-conditioned matrix, 114, 391 
Ill-conditioned normal equation, 364 
Image processing, multichannel, 
393-394, 424-432 
Image, vector, 63 
Imaginary axis, A5 
Imaginary numbers, pure, A5 
Imaginary part 

complex number, A3 
complex vector, 297-298 
Implicit definition of Nul A, 148, 200, 
204 

Implicit description, 44, 263 
Inconsistent system, 4, 8 
See also Linear system 
Indexed set, 56, 208 
Indifference curve, 412-413 


Inequality 
Bessel’s, 390 

Cauchy-Schwarz, 379-380 
triangle, 380 

Infinite dimensional space, 226 
Infinite set, 225n 
Initial value problem, 312 
Inner product, 101, 330-331, 376 
angles, 335 
axioms, 376 
on C[a, b], 380-382 
evaluation, 380 
length/norm, 333, 377 
on P n , 377 
properties, 331 

Inner product space, 376-390 
best approximation in, 378-379 
Cauchy-Schwarz inequality in, 
379-380 
definition of, 376 
distances in, 377 
in Fourier series, 387-388 
Gram-Schmidt process in, 377-378 
lengths (norms) in, 377 
orthogonality in, 377 
for trend analysis of data, 385-386 
triangle inequality in, 380 
weighted least-squares, 383-385 
Input sequence, 264 

See also Control system 
Interchange matrix, 106, 173 
Interior point, 465 
Intermediate demand, 132 
International Celestial Reference System, 
448n 

Interpolated colors, 449-450 
Interpolating polynomial, 23, 160 
Introduction and Examples, 10.1 
Invariant plane, 300 
Inverse, 103 

algorithm for, 107-108 
augmented columns, 108 
condition number, 114, 116 
determinant, 103 
elementary matrix, 106-107 
flexibility matrix, 104 
formula, 103, 179 
ill-conditioned matrix, 114 
linear transformation, 113 
Moore-Penrose, 422 
partitioned matrix, 119, 122 
product, 105 

stiffness matrix, 104-105 
transpose, 105 

Inverse power method, 322-324 
Invertible 


linear transformation, 113 
matrix, 103, 106-107, 171 
Invertible Matrix Theorem, 112-113, 
156, 157, 171,235, 275,421 
Isomorphic vector spaces, 155, 230 
Isomorphism, 155, 220-222, 249, 378n 
Iterative methods 
eigenspace, 320-321 
eigenvalues, 277, 319-325 
formula for (/ _ C)~\ 134-135, 137 
inverse power method, 322-324 
Jacobi’s method, 279 
power method, 319-322 
QR algorithm, 279, 280, 324 

Jacobian matrix, 304n 
Jacobi’s method, 279 
Jordan form, 292 
Jordan, Wilhelm, 12n 
Junctions, 52 

众 -crosspolytope, 480 
Kernel, 203-205 
众 -face, 470 

Kirchhoff’s laws, 82, 83 
^-pyramid, 480 

Ladder network, 128-129, 130-131 

Laguerre polynomial, 229 

Lamberson, R., 265 

Landsat image, 393-394,429, 430 

LAPACK, 100, 120 

Laplace transform, 122,178 

Law of cosines, 335 

Leading entry, 12-13 

Leading variable, 18n 

Least-squares fit 

cubic trend, 372 (fig.) 
linear trend, 385-386 
quadratic trend, 385-386 
scatter plot, 371 
seasonal trend, 373, 375 (fig.) 
trend surface, 372 

Least-squares problem, 329, 360-375 
column space, 360-362 
curve-fitting, 371-372 
error, 363-364 
lines, 368-370 
mean-deviation form, 370 
multiple regression, 372-373 
normal equations, 329, 361-362, 370 
orthogonal columns, 364 
plane, 372-373 
QR factorization, 364-365 
residuals, 369 

singular value decomposition, 422 
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Least-squares problem (continued) 
sum of the squares for error, 375 ， 
383-384 

weighted, 383-385 
See also Inner product space 
Least-squares solution, 330, 360, 422 
alternative calculation, 364-366 
minimum length, 422,433 
QR factorization, 364-365 
Left distributive law, 97 
Left singular vector, 417 
Left-multiplication, 98, 106, 107, 176, 
358 

Legendre polynomial, 383 
Length of vector, 331-332, 377 
singular values, 416 
Leontief, Wasily, 1,132, 137n 
exchange model, 49 
input-output model, 132-138 
production equation, 133 
Level set, 462 
Line(s) 

degenerate, 69, 439 
equation of, 2, 45 
explicit description of, 463 
as flat, 440 

geometric descriptions of, 440 
implicit equation of, 461 
parametric vector equation, 44 
of regression, 369 
Span {y}, 30 
translation of, 45 
Line segment, 454 
Line segment, directed, 25 
Linear combination, 27-31, 35, 194 
affine combination. See Affine 
combinations 
in applications, 31 
weights, 27, 35, 201 

Linear dependence, 56-57, 58 (fig.), 208, 
444 

affine dependence and, 445-446, 452 
column space, 211-212 
row-equivalent matrices, A1 
row operations, 233 
Linear difference equation. See 
Difference equation 
Linear equation, 2-12 
See also Linear system 
Linear filter, 246 
Linear functionals, 461,466, 472 
maximum value of, 473 
Linear independence, 55-62, 208 
eigenvectors, 270 
matrix columns, 57, 77 
in P 3 , 220 


in R n ,59 

sets, 56, 208-216, 227 
signals, 245-246 
zero vector, 59 

Linear model. See Mathematical model 

Linear programming, 2 
partitioned matrix, 120 

Linear Programming-Geometric Method, 

9.2 

Linear Programming-Simplex Method, 

9.3 

Linear recurrence relation. See 
Difference equation 

Linear system, 2-3, 29, 35-36 
basic strategy for solving, A—7 
coefficient matrix, 4 
consistent/inconsistent, 4, 7-8 
equivalent, 3 

existence of solutions, 7-9, 20-21 
general solution, 18 
homogeneous, 43-44, 49-51 
linear independence, 55-62 
and matrix equation, 34-36 
matrix notation, 4 
nonhomogeneous, 44-46, 234 
over-/underdetermined, 23 
parametric solution, 19-20, 44 
solution sets, 3, 18-21, 43-49 
and vector equations, 29 
See also Linear transformation; 

Row operation 

Linear transformation, 62-80, 85, 
203-205, 248, 288-295 
B-matrix, 290, 292 
composite, 94, 140 
composition of, 95 
contraction/dilation, 66, 71 
of data, 67-68 
determinants, 182-184 
diagonal matrix representation, 291 
differentiation, 205 
domain/codomain, 63 
geometric, 72-75 
Givens rotation, 90 
Householder reflection, 161 
invertible, 113-114 
isomorphism, 220-222 
kernel, 203-205 
matrix of, 70-80, 289-290, 293 
null space, 203-205 
one-to-one/onto, 75-77 
projection, 75 
properties, 65 
on R n , 291-292 
range, 63, 203-205 
reflection, 73, 161, 345-346 


rotation, 67 (fig.), 72 
shear, 65, 74, 139 
similarity, 277, 292-293 
standard matrix, 71-72 
vector space, 203-205, 290-291 
See also Isomorphism; Superposition 
principle 
Linear trend, 387 
Linearity property of determinant 
function, 173, 187 

Linearly dependent set, 56, 58, 60, 208 
Linearly independent eigenvectors, 270, 
282 

Linearly independent set, 56, 57-58, 
208-216 
See also Basis 
Long-term behavior 

of a dynamical system, 301 
of a Markov chain, 256, 259 
Loop current, 82 

Lower triangular matrix, 115, 124, 
125-126, 127 
Low-pass filter, 247, 367 
LU factorization, 92, 124-127, 130, 323 

196 

Macromedia Freehand, 481 
Main diagonal, 92 
Maple, 279 
Mapping, 63 

composition of, 94 
coordinate, 216-217, 219-222, 239 
eigenvectors, 290-291 
matrix factorizations, 288-289 
one-to-one, 75-77 
onto R m , 75, 77 
signal processing, 248 
See also Linear transformation 
Marginal propensity to consume, 251 
Mark II computer, 1 
Markov chain, 253-262 
convergence, 258 
eigenvectors, 279 
predictions, 256-257 
probability vector, 254 
state vector, 254 

steady-state vector, 257-260, 279 
stochastic matrix, 254 
Markov Chains and Baseball Statistics, 
10.6 

Mass-spring system, 196, 205, 214 
Mathematica, 279 
Mathematical ecologists, 265 
Mathematical model, 1, 80-85 
aircraft, 91, 138 
beam, 104 
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electrical network, 82 
linear, 80-85, 132, 266, 302, 371 
nutrition, 80-82 

population, 84-85, 254, 257-258 
predator-prey, 302-303 
spotted owl, 265-266 
stage-matrix, 265-266, 307-309 
See also Markov chain 
MATLAB, 23, 116, 130, 185, 262, 279, 
308, 323, 324, 327, 359 
Matrix, 92-161 

adjoint/adjugate, 179 
anticommuting, 160 
augmented, 4 
band, 131 
bidiagonal, 131 
block, 117 
Casorati, 245-246 

change-of-coordinates, 219, 240-241 

characteristic equation, 273-281 

coefficient, 4, 37 

of cofactors, 179 

column space, 201-203 

column sum, 134 

column vector, 24 

commutativity, 98, 103, 160 

companion, 327 

consumption, 133, 137 

controllability, 264 

covariance, 425-427 

design, 368 

diagonal, 92, 120 

diagonalizable, 282 

echelon, 14 

elementary, 106-107, 173-174, 390 

flexibility, 104 

geometry, 485 

Gram, 432 

Hilbert, 116 

Householder, 161, 390 

identity, 38, 92, 97, 106 

ill-conditioned, 114, 364 

interchange, 173 

inverse, 103 

invertible, 103, 105, 112-113 
Jacobian, 304n 
leading entry, 12-13 
of a linear transformation, 70-80, 
289-290 

migration, 85, 254, 279 
m xn,4 

multiplication, 94-98, 118-119 
nonzero row/column, 13 
notation, 4 

null space, 147-148, 198-201 
of observations, 424 


orthogonal, 344, 395 
orthonormal, 344n 
orthonormal columns, 343-344 
partitioned, 117-123 
Pauli spin, 160 

positive definite/semidefinite, 406 
powers of, 98-99 
products, 94—98, 172-173 
projection, 398, 400 
pseudoinverse, 422 
of quadratic form, 401 
rank of, 153-160 
reduced echelon, 14 
regular stochastic, 258 
row equivalent, 6, 29n, A1 
row space, 231-233 
row-column rule, 96 
scalar multiple, 93-94 
scale, 173 

Schur complement, 122 

singular/nonsingular, 103, 113, 114 

size of, 4 

square, 111, 114 

standard, 71-72, 95 

stiffness, 104—105 

stochastic, 254, 261-262 

submatrix of, 117,264 

sum, 93-94 

symmetric, 394-399 

system, 122 

trace of, 294, 426 

transfer, 128-129 

transpose of, 99-100, 105 

tridiagonal, 131 

unit cost, 67 

unit lower triangular, 124 
Vandermonde, 160, 186, 327 
zero, 92 

See also Determinant; Diagonalizable 
matrix; Inverse; Matrix 
factorization (decomposition); 

Row operation; Triangular matrix 
Matrix equation, 34-36 
Matrix factorization (decomposition), 92, 
123-132 

Cholesky, 406, 432 
complex eigenvalue, 299-300 
diagonal, 281-288, 291-292 
in electrical engineering, 127-129 
full QR, 359 

linear transformations, 288-295 
LU, 124-126 
permuted LU, 127 
polar, 432 

QR, 130, 356-358, 364-365 
rank, 130 


rank-revealing, 432 
reduced LU, 130 
reduced SVD, 422 
Schur, 391 

similarity, 277, 292-293 
singular value decomposition, 130, 
414-424 

spectral, 130, 398-399 
Matrix Games, 9.1 
Matrix inversion, 102-111 
Matrix multiplication, 94-98 
block, 118 

column-row expansion, 119 
and determinants, 172-173 
properties, 97-98 
row-column rule, 96 
See also Composition of linear 
transformations 

Matrix notation. See Back-substitution 
Matrix of coefficients, 4, 37 
Matrix of observations, 424 
Matrix program, 23 
Matrix transformation, 63-65, 71 
See also Linear transformation 
Matrix-vector product, 34-35 
properties, 39 
rule for computing, 38 
Maximum of quadratic form, 408 一 413 
Mean, sample, 425 
Mean square error, 388 
Mean-deviation form, 370, 425 
Microchip, 117 

Migration matrix, 85, 254, 279 
Minimal realization, 129 
Minimal representation of polytope, 
471472, 474-475 
Minimum length solution, 433 
Minimum of quadratic form, 408-413 
Model, mathematical. See Mathematical 
model 

Modulus, A4 
Moebius, A.F., 448 
Molecular modeling, 140-141 
Moore-Penrose inverse, 422 
Moving average, 252 
Muir, Thomas, 163 
Multichannel image. 

See Image processing, multichannel 
Multiple regression, 372-373 
Multiplicative property of det, 173, 275 
Multiplicity of eigenvalue, 276 
Multivariate data, 424, 428-429 

NAD (North American Datum), 329, 330 
National Geodetic Survey, 329 
Natural cubic splines, 481 
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Negative definite quadratic form, 405 
Negative flow, in a network branch, 82 
Negative of a vector, 191 
Negative semidefinite form, 405 
Network, 52-53 
branch, 82 
branch current, 83 
electrical, 82-83, 86-87, 127-129 
flow, 52-53, 54-55, 82 
loop currents, 82, 86-87 
Nodes, 52 
Noise, random, 252 
Nonhomogeneous system, 44-46, 234 
difference equations, 246, 249-250 
Nonlinear dynamical system, 304n 
Nonsingular matrix, 103, 113 
Nontrivial solution, 43 
Nonzero column, 12 
Nonzero row, 12 

Nonzero singular values, 416417 
Norm of vector, 331-332, 377 
Normal equation, 329, 361-362 
ill-conditioned, 364 
Normal vector, 462 

North American Datum (NAD), 329, 330 
Null space, 147-148, 198-201 
basis, 149, 211-212, 231-232 
and column space, 202-203 
dimension of, 228, 233-234 
eigenspace, 268 
explicit description of, 200-201 
linear transformation, 203-205 
See also Fundamental subspaces; 

Kernel 
Nullity, 233 
Nutrition model, 80-82 

Observation vector, 368, 424-425 

Octahedron, 435, 436 

Ohm’s law, 82 

Oil exploration, 1 

One-to-one linear transformation, 

76,215 

See also Isomorphism 
One-to-one mapping, 75-77 
Onto mapping, 75, 77 
Open ball, 465 
Open set, 465 
OpenGL, 481 
Optimization, constrained. 

See Constrained optimization 
Orbit of a comet, 374 
Ordered w-tuple, 27 
Ordered pair, 24 
Orthogonal 

eigenvectors, 395 


matrix, 344, 395 
polynomials, 378, 386 
regression, 432 
set, 338-339, 387 
vectors, 333-334, 377 
Orthogonal basis, 338-339, 377-378, 
397,416 

for fundamental subspaces, 420-421 
Gram-Schmidt process, 354-356, 377 
Orthogonal complement, 334-335 
Orthogonal Decomposition Theorem, 

348 

Orthogonal diagonalization, 396 
principal component analysis, 427 
quadratic form, 402-403 
spectral decomposition, 398-399 
Orthogonal projection, 339-341, 
347-353 

geometric interpretation, 341, 349 
matrix, 351, 398, 400 
properties of, 350-352 
onto a subspace, 340, 347-348 
sum of, 341, 349 (fig.) 

Orthogonality, 333-334, 343 
Orthogonally diagonalizable, 396 
Orthonormal 

basis, 342, 351,356-358 
columns, 343-344 
matrix, 344n 
rows, 344 
set, 342-344 

Outer product, 101, 119, 161, 238 
Overdetermined system, 23 
Owl population model, 265-266, 
307-309 

P, 193 

P n , 192, 193, 209-210, 220-221 
dimension, 226 
inner product, 377 
standard basis, 209 
trend analysis, 386 
Parabola, 371 
Parallel 
line, 45 

processing, 1, 100 
solution sets, 45 (fig.), 46 (fig.), 249 
Parallel flats, 440 
Parallel hyperplanes, 462-464 
Parallelepiped, 180, 275 
Parallelogram 
area of, 180-181 
law, for vectors, 337 
region inside, 69, 183 
rule for addition, 26 
Parameter vector, 368 


Parametric 

continuity, 483, 484 
description, 19-20 
equation of a line, 44, 69 
equation of a plane, 44 
vector equation, 44-46 
vector form, 44, 46 
Partial pivoting, 17, 127 
Partitioned matrix, 91, 117-123 

addition and multiplication, 118-119 
algorithms, 120 
block diagonal, 120 
block upper triangular, 119 
column-row expansion, 119 
conformable, 118 
inverse of, 119-120, 122 
outer product, 119 
Schur complement, 122 
submatrices, 117 
Partitions, 117 
Paths, random, 163 
Pauli spin matrix, 160 
Pentatope, 476-477 
Permuted LU factorization, 127 
Perspective projection, 142-143 
Phase 

backward, 17, 125 
forward, 17 

Physics, barycentric coordinates in, 448 
Phong shading, 487 
Pivot, 15 

column, 14, 149-150, 212,233, A1 
positions, 14-15 
product, 171, 274 
Pixel, 393 
Plane(s) 

geometric descriptions of, 440 
implicit equation of, 461 
Plato, 435 

Platonic solids, 435-436 
Point ⑻ 

affine combinations of, 437-439, 
441442 
boundary, 465 
extreme, 470-473 
interior, 465 
Point masses, 33 
Polar coordinates, A6 
Polar decomposition, 432 
Polygon, 435-436,470 
Polyhedron, 470 
regular, 435, 480 
Polynomial(s) 
blending, 485n 
characteristic, 276, 277 
degree of, 192 
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Hermite, 229 

interpolating, 23, 160 

Laguerre, 229 

Legendre, 383 

orthogonal, 378, 386 

in P„,192,193, 209-210, 220-221 

set, 192 

trigonometric, 387 
zero, 192 

Polytope(s), 469-481 

definitions, See 470-471, 473 
explicit representation of, 473 
hypercube, 477-479 
implicit representation of, 473-474 
A:-crosspolytope, 480 
A:-pyramid, 480 

minimal representation of, 471-472, 
474475, 479 
simplex, 435, 475-477 
Population model, 84-85, 253-254, 
257-258, 302-303, 307-309, 
310 

Positive definite matrix, 406 
Positive definite quadratic form, 405 
Positive semidefinite matrix, 406 
PostScript® fonts, 484-485, 492 
Power method, 319-322 
Powers of a complex number, A7 
Powers of a matrix, 98-99 
Predator-prey model, 302-303 
Predicted j-value, 369 
Preprocessing, 123 
Price equation, 137 
Price vector, 137 
Prices, equilibrium, 49-51, 54 
Principal Axes Theorem, 403 
Principal component analysis, 393-394, 
424, 427-428 
covariance matrix, 425 
first principal component, 427 
matrix of observations, 424 
multivariate data, 424,428-429 
singular value decomposition, 429 
Probability vector, 254 
Process control data, 424 
Product 

of complex numbers, A7 
dot, 330 

of elementary matrices, 106,174 
inner, 101,330-331,376 
of matrices, 94-98, 172-173 
of matrix inverses, 105 
of matrix transposes, 99-100 
matrix-vector, 34 
outer, 101, 119 
scalar, 101 


See also Column-row expansion; 
Inner product 
Production equation, 133 
Production vector, 132 
Profile, 470, 472 
Projection 

matrix, 398, 400 
perspective, 142-144 
transformations, 65, 75, 161 
See also Orthogonal projection 
Proper subset, 440n 
Properties 

determinants, 169-177 
inner product, 331, 376, 381 
linear transformation, 65-66, 76 
matrix addition, 93-94 
matrix inversion, 105 
matrix multiplication, 97-98 
matrix-vector product, Ax, 39-40 
orthogonal projections, 350-352 
ofR n ,27 
rank, 263 
transpose, 99-100 
See also Invertible Matrix Theorem 
Properties of Determinants Theorem, 275 
Pseudoinverse, 422, 433 
Public work schedules, 412413 
feasible set, 412 
indifference curve, 412-413 
utility, 412 

Pure imaginary number, A5 
Pythagorean Theorem, 334, 350 
Pythagoreans, 435 

QR algorithm, 279, 280, 324 
QR factorization, 130, 356-358, 390 
Cholesky factorization, 432 
full QR factorization, 359 
least squares, 364-365 
QR Factorization Theorem, 357 
Quadratic Bezier curve, 460, 481-482, 
492 

Quadratic form, 401-408 
change of variable, 402-403 
classifying, 405-406 
cross-product term, 401 
indefinite, 405 

maximum and minimum, 408-413 
orthogonal diagonalization, 402-403 
positive definite, 405 
principal axes of, geometric view of, 
403405 

See also Constrained optimization; 
Symmetric matrix 
Quadratic Forms and Eigenvalues 
Theorem, 405-406 


R”，27 

algebraic properties of, 27, 34 
change of basis, 241-242 
dimension, 226 
inner product, 330-331 
length (norm), 331-332 
quadratic form, 401 
standard basis, 209, 342 
subspace, 146-153, 348 
topology in, 465 
R 2 and E 3 , 24-27, 193 
Random paths, 163 

Range of transformation, 63, 203-205, 
263 

Rank, 153-160, 230-238 
in control systems, 264 
effective, 157,417 
estimation, 417n 
factorization, 130, 263-264 
full, 237 

Invertible Matrix Theorem, 157-158, 
235 

properties of, 263 
See also Outer product 
Rank Theorem, 156, 233-234 
Rank-revealing factorization, 432 
Rayleigh quotient, 324, 391 
Ray-tracing method, 450-451 
Ray-triangle intersections, 450-451 
Real axis, A5 
Real part 

complex number, A3 
complex vector, 297-298 
Real vector space, 190 
Rectangular coordinate system, 25 
Recurrence relation. See Difference 
equation 

Recursive subdivision of Bezier curves, 
surfaces, 488489 
Reduced echelon form, 13, 14 
basis for null space, 200, 231-233 
solution of system, 18, 20, 21 
uniqueness of, A1 
Reduced LU factorization, 130 
Reduced singular value decomposition, 
422, 433 

Reduction to first-order equation, 250 
Reflection, 73, 345-346 
Householder, 161 
Reflector matrix, 161, 390 
Regression 

coefficients, 369 
line, 369 

multiple, 372-373 
orthogonal, 432 
Regular polyhedra, 435 
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Regular polyhedron, 480 
Regular solids, 434 
Relative change, 391 
Relative error, 391 

See also Condition number 
Rendering graphics, 487 
Repeller, 304, 314 
Residual, 369, 371 
Resistance, 82 
RGB coordinates, 449-450 
Riemann sum, 381 
Right singular vector, 417 
Right distributive law, 97 
Right multiplication, 98, 176 
RLC circuit, 214-215 
Rotation due to a complex eigenvalue, 
297, 299-300, 308 (fig.) 

Rotation transformation, 67 (fig.) ， 72, 90, 
140, 141-142, 144 
Roundabout, 55 

Roundoff error, 9, 114, 269, 358, 417, 
420 

Row-column rule, 96 

Row equivalent matrices, 6, 13, 107, 

277, A1 

notation, 18, 29n 
Row operation, 6, 169-170 
back-substitution, 19-20 
basic/free variable, 18 
determinants, 169-170, 174, 275 
echelon form, 13 
eigenvalues, 267, 277 
elementary, 6, 106 
existence/uniqueness, 20-21 
inverse, 105, 107 

linear dependence relations, 150, 233 
pivot positions, 14-15 
rank, 236,417 
See also Linear system 
Row reduction algorithm, 15-17 
backward phase, 17, 20, 125 
forward phase, 17, 20 
See also Row operation 
Row replacement matrix, 106, 173 
Row space, 231-233 
basis, 231-233 
dimension of, 233 
Invertible Matrix Theorem, 235 
See also Fundamental subspaces 
Row vector, 231 
Row-vector rule, 38 

S, 191, 244, 245-246 
Saddle point, 304, 305 (fig.), 307 (fig.), 
314 

Sample covariance matrix, 426 


Sample mean, 425 
Sample variance, 430^431 
Samuelson, P.A., 25In 
Scalar, 25, 190, 191 

Scalar multiple, 24, 27 (fig.), 93-94, 190 

Scalar product. See Inner product 

Scale a nonzero vector, 332 

Scale matrix, 173 

Scatter plot, 425 

Scene variance, 393-394 

Schur complement, 122 

Schur factorization, 391 

Second principal component, 427 

Series circuit, 128 

Set ⑻ 

affine, 439-441,455, 456 
bounded, 465 
closed, 465, 466 
compact, 465, 467 
convex, 455-459 
level, 462 
open, 465 

vector. See Vector set 
Shear transformation, 65, 74, 139 
Shear-and-scale transformation, 145 
Shunt circuit, 128 
Signal processing, 246 
auxiliary equation, 248 
filter coefficients, 246 
fundamental solution set, 249 
linear difference equation, 246-249 
linear filter, 246 
low-pass filter, 247, 367 
moving average, 252 
reduction to first-order, 250 
See also Dynamical system 
Signals 

control systems, 189, 190 
discrete-time, 191-192, 244-245 
function, 189-190 
noise, 252 
sampled, 191, 244 
vector space, S, 191, 244 
Similar matrices, 277, 279, 280, 282, 
292-293 

See also Diagonalizable matrix 
Similarity transformation, 277 
Simplex, 475-477 

construction of, 475-476 
four-dimensional, 435 
Singular matrix, 103, 113, 114 
Singular value decomposition (SVD), 
130,414-424 
condition number, 420 
estimating matrix rank, 157,417 
fundamental subspaces, 420-421 


least-squares solution, 422 
m xn matrix,416-417 
principal component analysis, 429 
pseudoinverse, 422 
rank of matrix, 417 
reduced, 422 
singular vectors, 417 
Singular Value Decomposition Theorem, 
417 

Sink of dynamical system, 314 
Size of a matrix, 4 
Solids, Platonic, 435-436 
Solution (set), 3, 18-21,46, 248, 312 
of Ax = b, 441 

difference equations, 248-249, 271 
differential equations, 312 
explicit description of, 18, 44, 271 
fundamental, 249, 312 
general, 18,43, 44-45, 249-250, 
302-303, 315 

geometric visualization, 45 (fig.), 46 
(fig.), 250 (fig.) 
homogeneous system, 43, 148, 
247-248 

minimum length, 433 
nonhomogeneous system, 44-46, 
249-250 
null space, 199 
parametric, 19-20, 44,46 
row equivalent matrices, 6 
subspace, 148, 199, 248-249, 268, 312 
superposition, 83, 312 
triviatnontrivial, 43 
unique, 7-9, 21, 75 
See also Least-squares solution 
Source of dynamical system, 314 
Space shuttle, 189-190 
Span, 30, 36-37 
affine, 437 

linear independence, 58 
orthogonal projection, 340 
subspace, 156 
Spanning set, 194, 212 
Spanning Set Theorem, 210-211 
Span {u, y} as a plane, 30 (fig.) 

Span {y} as a line, 30 (fig.) 

Span {vi,..., v p }, 30, 194 
Sparse matrix, 91, 135, 172 
Spatial dimension, 425 
Spectral components, 425 
Spectral decomposition, 398-399 
Spectral dimension, 425 
Spectral factorization, 130 
Spectral Theorem, 397-398 
Spiral point, 317 


Index 111 


Splines, 490 

B-, 484,485, 490, 491 
natural cubic, 481 
Spotted owl, 265-266, 301-302, 
307-309 

Square matrix, 111, 114 
Stage-matrix model, 265-266, 307-309 
Standard basis, 148, 209, 241, 342 
Standard matrix, 71-72, 95, 288 
Standard position, 404 
State vector, 122, 254, 264 
State-space model, 264, 301 
Steady-state 
heat flow, 131 
response, 301 
temperature, 11, 87, 131 
vector, 257-260, 266-267, 279 
The Steady-State Vector and Google’s 
PageRank, 10.2 
Stiffness matrix, 104-105 
Stochastic matrix, 254, 261-262, 
266-267 
regular, 258 

Strictly dominant eigenvalue, 319 
Strictly separate hyperplanes, 466 
Submatrix, 117, 264 
Subset, proper, 440n 
Subspace, 146-153, 193, 248 
basis for, 148-150, 209 
column space, 147-148, 201 
dimension of, 155-156, 226-227 
eigenspace, 268 

fundamental, 237, 335 (fig.), 420-421 
homogeneous system, 200 
intersection of, 197, 456 
linear transformation, 204 (fig.) 
null space, 147-148, 199 
spanned by a set, 147, 194 
sum, 197 
zero, 147, 193 
See also Vector space 
Sum of squares for error, 375, 383-384 
Superposition principle, 66, 83, 312 
Supporting hyperplane, 470 
Surface normal, 487 
Surface rendering, 144 
SVD. See Singular value decomposition 
(SVD) 

Symbolic determinant, 464 
Symmetric matrix, 324, 394-399 
diagonalization of, 395-397 
positive definite/semidefinite, 405 
spectral theorem for, 397-398 
See also Quadratic form 
Synthesis of data, 123 
System, linear. See Linear system 


System matrix, 122 

Tangent vector, 482-483,490-492 
Tetrahedron, 185, 435, 436 
Theorem 

affine combination of points, 437 — 438 
Basis, 156, 227 
Best Approximation, 350 
Caratheodory, 457-458 
Cauchy-Schwarz Inequality, 379 
Cayley-Hamilton, 326 
Characterization of Linearly 
Dependent Sets, 58, 60, 208 
Column-Row Expansion of AB, 119 
Cramer’s Rule, 177 
De Moivre’s, A7 

Diagonal Matrix Representation, 291 
Diagonalization, 282 
Existence and Uniqueness, 21, 43 
Gram-Schmidt Process, 355 
Inverse Formula, 179 
Invertible Matrix, 112-113, 156-157, 
171,235,275,421 
Multiplicative Property (of det), 173 
Orthogonal Decomposition, 348 
Principal Axes, 403 
Pythagorean, 334 
QR Factorization, 357 
Quadratic Forms and Eigenvalues, 
405406 

Rank, 156, 233-234 
Row Operations, 169 
Singular Value Decomposition, 417 
Spanning Set, 210-211, 212 
Spectral, 397-398 
Triangle Inequality, 380 
Unique Representation, 216,447 
Uniqueness of the Reduced Echelon 
Form, 13, A1 

Three-moment equation, 252 
Total variance, 426 
fraction explained, 428 
Trace of a matrix, 294,426 
Trajectory, 303, 313 
Transfer function, 122 
Transfer matrix, 128 
Transformation 
affine, 69 
codomain, 63 
definition of, 63 
domain of, 63 
identity, 290 

image of a vector x under, 63 
range of, 63 

See also Linear transformation 
Translation, vector, 45 


in homogeneous coordinates, 139-140 
Transpose, 99-100 
conjugate, 39In 
of inverse, 105 
of matrix of cofactors, 179 
of product, 99 
properties of, 99-100 
Trend analysis, 385-386 
Trend surface, 372 
Triangle, area of, 185 
Triangle inequality, 380 
Triangular matrix, 5 
determinants, 167 
eigenvalues, 269 
lower, 115, 125-126, 127 
upper, 115, 119-120 
Tridiagonal matrix, 131 
Trigonometric polynomial, 387 
Trivial solution, 43 
TrueType® fonts, 492 

Uncorrelated variable, 427 
Underdetermined system, 23 
Uniform B-spline, 491 
Unique Representation Theorem, 216, 
447 

Unique vector, 197 

Uniqueness question, 7-9, 20-21, 64, 72 

Unit cell, 217-218 

Unit consumption vector, 132 

Unit cost matrix, 67 

Unit lower triangular matrix, 124 

Unit square, 72 

Unit vector, 332, 377, 408 

Unstable equilibrium, 310 

Upper triangular matrix, 115, 119-120 

Utility function, 412 

Value added vector, 137 
Vandermonde matrix, 160, 186, 327 
Variable, 18 
basic/free, 18 
leading, 18n 
uncorrelated, 427 
See also Change of variable 
Variance, 362-363, 375, 384n, 426 
sample, 430-431 
scene, 393-394 
total, 426 

Variation-diminishing property of Bezier 
curves and surfaces, 488 
Vector(s), 24 

addition/subtraction, 24, 25, 26, 27 
angles between, 335-336 
as arrows, 25 (fig.) 
column, 24 
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Vector(s) (continued) 
complex, 24n 
coordinate, 154, 216-217 
cost, 31 

decomposing, 342 
distance between, 332-333 
equal, 24 

equilibrium, 257-260 
final demand, 132 
geometry, 486 
image, 63 
left singular, 417 
length/norm, 331-332, 377, 416 
linear combinations, 27-31, 60 
linearly dependent/independent, 
56-60 

negative, 191 
normal, 462 
normalizing, 332 
observation, 368, 424—425 
orthogonal, 333-334 
parameter, 368 
as a point, 25 (fig.) 
price, 137 
probability, 254 
production, 132 
in R 2 , 24-26 
in R 3 , 27 
in IT, 27 

reflection, 345-346 
residual, 371 
singular, 417 
state, 122, 254, 264 


steady-state, 257-260, 266-267, 279 
sum, 24 

tangent, 482-483 
translations, 45 
unique, 197 
unit, 132, 332, 377 
value added, 137 
weights, 27 

zero, 27, 59, 146, 147, 190, 191, 334 
See also Eigenvector 
Vector addition, 25 
as translation, 45 
Vector equation 

linear dependence relation, 56-57 
parametric, 44, 46 
Vector set, 56-60, 338-346 
indexed, 56 

linear independence, 208-216, 
225-228 

orthogonal, 338-339, 395 
orthonormal, 342-344, 351, 356 
polynomial, 192, 193 
Vector space, 189-264 
of arrows, 191 
axioms, 191 
complex, 190n 

and difference equations, 248-250 
and differential equations, 204-205, 
312 

of discrete-time signals, 191-192 
finite-dimensional, 226, 227-228 
of functions, 192, 380 
infinite-dimensional, 226 


of polynomials, 192, 377 
real, 190n 

See also Geometry of vector spaces; 
Inner product space; Subspace 
Vector subtraction, 27 
Vector sum, 24 
Vertex/vertices, 138 
of polyhedron, 470-471 
Vibration of a weighted spring, 196, 205, 
214 

Viewing plane, 142 
Virtual reality, 141 
Volt, 82 
Volume 

determinants as, 180-182 
ellipsoid, 185 

parallelepiped, 180-181, 275 
tetrahedron, 185 

Weighted least squares, 376, 383-385 
Weights, 27, 35 

as free variables, 201 
Wire-frame approximation, 449 
Wire-frame models, 91, 138 

Zero functional, 461 
Zero matrix, 92 
Zero polynomial, 192 
Zero solution, 43 
Zero subspace, 147, 193 
Zero vector, 27, 59 
orthogonal, 334 
subspace, 147 
unique, 191, 197 
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